Download PDF
Research Article  |  Open Access  |  28 Jan 2024

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Views: 268 |  Downloads: 63 |  Cited:   0
Green Manuf Open 2024;2:2.
10.20517/gmo.2023.091901 |  © The Author(s) 2024.
Author Information
Article Notes
Cite This Article

Abstract

End-of-life products recycling can reduce the waste of resources, and disassembly line scheduling planning can effectively improve the recycling efficiency and reduce the pollution of the environment. This work addresses a bi-objective disassembly line scheduling problem with considering time interference between tasks. The weighted sum of the cycle time and hazard coefficients is optimized. First, a mathematical model of the disassembly line scheduling problem is established under the constraints of priority and time interference relationships. Second, four meta-heuristics are improved to solve the concerned problems, including particle swarm optimization, artificial bee colony, genetic algorithm and variable neighborhood search. Ten objective-oriented local search operations are designed for improving meta-heuristics’ performance. A reinforcement learning algorithm, Sarsa, is employed to guide task assignment among workstations and local search selection during iterations, respectively. Finally, experiments are carried out for 10 instances with different scales. The effectiveness of the improving strategies is verified; the meta-heuristics combined with Sarsa based task assignment and local search strategies has better robustness and stability than the classical ones. Comparisons and discussions show that the particle swarm optimization with improved strategies outperforms other algorithms.

Keywords

Disassembly line scheduling, meta-heuristics, Sarsa algorithm, bi-objective

INTRODUCTION

As economic and societal development progresses, governments are introducing regulations aimed at safeguarding the environment, while people’s consciousness regarding environmental protection is steadily increasing. The emergence of recycling economy is playing a pivotal role in propelling the growth of the remanufacturing industry[1]. The rapid pace of product obsolescence, coupled with technological advancements, is considerably shortening the lifespan of products[2]. In contrast to methods such as incineration and landfill, the processes of disassembly and recycling offer notable economic and environmental benefits. Consequently, there is a growing demand for the proper treatment of a substantial volume of discarded products. In this context, the recycling of End-of-Life (EOL) products holds immense significance, yielding advantages for both environmental preservation and the principles underpinning the circular economy. Since the recycling trend is unavoidable, the disassembly of EOL products emerges as a pivotal undertaking, bearing considerable research importance[3,4].

To address the challenges posed by large-scale disassembly, Güngör and Gupta introduce the concept of disassembly lines, underscoring their vital role in product disassembly and recycling[5]. Disassembly sequence planning focuses on optimizing the disassembly sequence of EOL products’ components, aiming to minimize disassembly costs, enhance recycling efficiency, and maintain a reasonable level of stability throughout the disassembly process[6]. Planning and scheduling EOL products’ disassembly order is a challenge to enhance the efficiency and stability of disassembly lines. The disassembly line scheduling problem (DLSP) has been proven to be NP-hard[7]. Literature[8] introduces the concept of sequentially-dependent disassembly line balance problems (DLBPs) and formulates the corresponding mathematical models. Emrah[9] applies constraint programming for the first time to the disassembly line balancing problem, improving the optimal solution for several medium-sized benchmark instances. ÇİL presents a mixed-integer linear programming (MILP) model for multiplayer disassembly line balancing and develops constraint programming methods to solve large-scale instances[10]. Meng et al. present a mixed-integer linear model for solving disassembly planning and scheduling problems[11]. Some publications tackle DLSP using diverse traditional mathematical methods[12-14]. However, they are not suitable for the challenges from large-scale instances in real-life situations[15]. Compared to the traditional mathematical optimization methods, the meta-heuristics are suitable for a wider range of problem structures and can obtain high-quality solutions for large-scale problems quickly. However, meta-heuristics are usually only able to find near-optimal solutions. In recent years, many kinds of advanced meta-heuristics have been proposed for the exploration of decision-making problems in various domains. Chen et al. propose a self-adaptive fast fireworks algorithm (SF-FWA) that enables linear computational complexity in terms of problem dimensionality and makes the overall able to automatically adapt to a rich set of function landscapes[16]. Singh et al. solve the proposed multi-objective mathematical model for resource allocation through exact approaches and meta-heuristics[17]. Pasha et al. design a customized multi-objective hybrid meta-heuristic that directly considers problem-specific attributes to solve a multi-objective optimization model of the vehicle path problem in a box factory[18]. Singh et al. extend the research by proposing a novel ant-based generative structural hyper-heuristic to investigate how different pheromone graphs affect their performance[19]. Experiments demonstrate key differences in performance between two different pheromone spaces. Furthermore, researchers in various fields solve decision-making problems through meta-heuristics[20-27].

ZMeta-heuristics are gradually applied for solving disassembly scheduling problems due to their excellent performance in balancing the computation cost and solution quality[28-30], including particle swarm optimization (PSO)[31-36], artificial bee colony (ABC)[37-40], genetic algorithm (GA)[41-46], and variable neighborhood search (VNS)[40,47-50]. Kalayci et al. propose a PSO based on variable neighborhood mutation operators to solve DLBPs[31]. A problem-specific swarm optimization method is designed to modify the adaptive parameters in disassembly sorting, thereby controlling the update mechanism of the disassembly process[34]. Wang et al. establish a DLBP model that balances the economy and environment, considering the precedence constraints, and develop a discrete multi-objective ABC algorithm to solve the problem[37]. For solving the DLSP, Zhang et al. use a hybrid graph to represent constraints and precedence relationships and propose a hybrid ABC algorithm[39]. Slama et al. propose a block-based GA to solve the DLSP[41]. The DLSP is efficiently addressed by Wu et al. through the proposal of a hybrid local search GA, along with the implementation of a four-layer encoding and decoding strategy[42]. A new genetic simulated annealing algorithm is proposed by Wang et al. to optimize the model[46]. According to the advantages of GA, two-point mapping crossover and single-point insertion mutation operations are constructed to guarantee the sequence’s priority and disassembly constraints. To balance the disassembly line, Ren et al. combine variable local search algorithms with multi-criteria decision making to propose an improved general VNS (GVNS)[48]. Liu et al. formulate a disassembly planning model that revolves around the achievement rate of disassembly[50]. They introduce a comprehensive variable local search algorithm and integrate four distinct local search operators into their approach to mitigate the impact of unforeseeable variables.

In many publications, traditional mathematical methods and meta-heuristics are used to solve various scheduling and optimization problems, aiming to obtain optimal or approximate optimal solutions. In recent years, reinforcement learning algorithms have been employed separately or combined with meta-heuristics for addressing various scheduling problems, including disassembly scheduling problems. Tuncel et al. utilize a reinforcement learning algorithm based on Monte Carlo to address the disassembly line problem[51]. It can solve large-scale problems within a reasonable time frame. Zhao et al. employ ensemble reinforcement learning to tackle the challenge of structural uncertainty in EOL products, effectively adapting to the optimal disassembly sequence[52]. The literature[53] combines a brainstorming optimization algorithm with reinforcement learning and uses a reward feedback mechanism to guide the selection of four mutation strategies. By integrating metaheuristics and machine learning techniques, Karimi-Mamaghan et al. solve combinatorial optimization problems[54]. Furthermore, some researchers employ metaheuristics in conjunction with Q-learning strategies within the framework of reinforcement learning to effectively address scheduling problems[55-57].

In summary, most research has predominantly centered on optimizing the disassembly sequence or maximizing factors such as disassembly time, energy consumption, or profit across multiple objectives. Yet, comparatively less attention is directed toward assessing the impact of environmental protection pressures on disassembly time and costs, especially for potential harm from hazardous products. To address this gap, this study focuses on two objectives: cycle time (CT) and the hazard coefficients. The hazard coefficients are related to the position of the hazard tasks in a solution and the average disassembly time of all tasks. The contributions of this paper are given as follows:
(1) A mathematical model is developed for DLSP with optimizing the weighted sum of CT and hazard coefficients.
(2) For task assignment, two workstation allocation strategies are first designed, and Sarsa algorithm is used to select a premium one during iterations.
(3) Sarsa algorithm is employed to guide the selection of ten local search operators, which are designed for improving convergence of meta-heuristics.

The remaining sections of this study are structured as follows: Section “INTRODUCTION” presents a mathematical model for optimizing DLSP with CT and hazard coefficients. In Section “METHODS”, the proposed algorithms are described. Experimental results and comparisons are provided in Section “RESULTS”. Finally, Section “CONCLUSIONS AND FUTURE WORK” offers a summary of this work with several future directions.

METHODS

Problem description

Disassembly is a comprehensive manufacturing process that detaches a variety of components at multiple workstations. During a disassembly process, n tasks need to be allocated among m workstations. To streamline the complexity of the problem, we assume that all workstations can disassemble any task and are functionally identical[40]. A disassembly task involves a series of operations. Once a task is completed, the resulting components or sub-components can be further disassembled into their constituent parts until the core is entirely disassembled. In essence, the disassembly process is a progressive dismantling of the core into its individual components and subassemblies through a sequence of tasks. Upon satisfying the constraints, we allocate the disassembly tasks to specific workstations on the disassembly line, adhering to the disassembly order that meets the requirements. Once a task is assigned to a workstation, it becomes fixed and cannot be transferred to another workstation. It must be completed at the workstation. The number of available workstations remains constant, while different disassembly sequences can result in variations in disassembly time and hazard coefficients. The DLSP includes three sub-problems: (1) assigning tasks to workstations; (2) task sequencing; and (3) adjusting the task sequence to minimize the CT and hazard coefficients.

In DLSP, there are priority and time interference relationships between disassembly tasks. An example is shown in Figure 1, and the detail data is reported in Table 1. The numbers inside the circle represent disassembly tasks, and the solid arrow represents the priority order between tasks. Task 1 is the predecessor of Tasks 2, 3, and 4, indicating that Task 1 must be disassembled before proceeding with Tasks 2, 3, and 4. Tasks with the same background lines have a time interference relationship between them. Table 1 provides a detailed description of the example. For example, if Task 4 is disassembled first, it will increase the disassembly time of Task 5 by two units, and the actual disassembly time of Task 5 is 3 + 2 = 5 units. Conversely, if Task 5 is disassembled first, it will also increase the actual disassembly time of Task 4. Figure 2 shows the Gantt chart of a solution for the example.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 1. Task disassembly sequence diagram.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 2. The Gantt chart for task 7.

Table 1

The process of task disassembly

WorkstationDisassembly sequenceRun timeIdle timeCTHazard coefficients
1346257
15351382123.14
25 + 13 + 2115
34266
Actual disassembly time

Mathematical models

In this section, a mathematical model of DLSP is developed with minimizing CT and hazard coefficients. The following notations s are used to express the concerned problems, and the detail data is reported in Table 2.

Table 2

Notations description

NotationDescription
Parameters
iTask index, i ∈ {1, 2, …, N}.
jTask index, j ∈ {1, 2, …, N}.
kIndex of workstation, k ∈ {1, 2, …, M}.
MWorkstation number.
NTask number.
tiDisassembly time for task i.
taThe average processing time of all tasks.
sdijThe increasing disassembly time of task i, if task j interferes with task i, and j is disassembled before task i.
uijIf task i and task j have an interference relationship, uij = 1; otherwise, uij = 0.
tiThe actual disassembly time of task i is the sum of the standard time and the interference time.
BiThe start time for disassembly task i.
FiThe completion time of task i.
TkThe completion of workstation k.
GAn infinite positive number.
Pllth element in a disassembly sequence, e.g., for sequence {1, 4, 2, 3, 6, 8, 7, 5}, P2 = 4.
CTIndicates the maximum completion time of all opening workstations.
DijIf task i can be disassembled before task j, Dij = 1; otherwise, Dij = 0.
Decision variables
xikIf task i is assigned to workstation k, xik = 1, else xik = 0.
$$ h_{p_{l}} $$if the lth element in a disassembly sequence is hazardous, $$ h_{p_{l}} $$ = 1; otherwise, $$ h_{p_{l}} $$ = 0.
wijIf task i is disassembled before task j, wij = 1, else wij = 0.
yikjThe disassembly sequential relationship between tasks i, j. If part i is a precursor part of j and i is assigned to workstation k, yikj = 1, else yikj = 0.

With defined notations, the mathematical representation of the DLSP can be formulated, as shown in Table 3.

Table 3

The mathematical model of the DLSP

min f1 = λf2 + ωf3(1)
min f2 = CT = max{Tk | 1 ≤ kM}, k ∈ {1, 2, …, M}(2)
min f3 = $$ \sum_{l=1}^{N} $$ (l × $$ h_{p_{l}} $$ × ta)(3)
wij + wji = 1, ij, i, j ∈ {1, 2, …, N}(4)
ti’ ≥ ti + sdji × wij × uij, i, j ∈ {0, 1, …, N}, ij(5)
wijDij, i, j ∈ {0, 1, …, N}(6)
wii = 0, i ∈ {0, 1, …, N}(7)
ta = $$ \sum_{l=1}^{N} $$ti/N, i ∈ {1, 2, …, N}(8)
BjFi - G × (1 - wij), i, j ∈ {0, 1, …, N}(9)
Fi = Bi + ti’, i ∈ {0, 1, …, N}(10)
$$ \sum_{k=1}^{M} $$xik = 1, i ∈ {1, 2, …, N}(11)
Tk = max{Fi × xik}, i ∈ {1, 2, …, N}, k ∈ {1, 2, …, M}(12)
xik ∈ {0, 1}, i ∈ {0, 1, …, N}, k ∈ {1, 2, …, M}(13)
wij ∈ {0, 1}, i, j ∈ {0, 1, …, N}(14)
yikj ∈ {0, 1}, i, j ∈ {0, 1, …, N}, k ∈ {1, 2, …, M}(15)
t’ ≥ 0, Bi ≥ 0, B0 = 0, Fi ≥ 0, Tk ≥ 0, i, j ∈ {0, 1, …, N}, k ∈ {1, 2, …, M}(16)

Objective function (1) is calculated by assigning different weights to objective functions (2) and (3), where λ + ω = 1. Objective function (2) is to maximize the actual completion time among the running workstations. Objective function (3) is the total hazard coefficients of all hazard tasks. Constraint (4) indicates that there is a precedence relationship between task i and task j in the disassembly process. Constraint (5) states that the actual disassembly time for a task is the sum of its standard time and the interference time. Constraint (6) means that the processing order of tasks needs to satisfy the disassembly priority relationship. Constraint (7) ensures that the task is executed only once. Constraint (8) calculates the average processing time of all tasks. Constraint (9) enforces that the start time of a subsequent task should be greater than or equal to the completion time of the previous one. Constraint (10) states that the completion time of a task is equal to the sum of its start and the actual disassembly time. Constraint (11) defines that each task can be assigned to only one workstation. Constraint (12) represents the actual completion time of workstations. Constraints (13) to (16) give the sign constraints on the decision variables.

Solution representation

In DLSP, the order of disassembly tasks is optimized to minimize the CT and hazard coefficients, while ensuring the priority constraint of tasks. For example, there is an initial disassembly order, s = (7, 3, 4, 6, 2, 5, 1), representing the disassembly tasks in Figure 1. As depicted in Figure 1, s does not meet the priority of tasks. Consequently, by considering prioritization of tasks appropriately, s can be adjusted to s’ as follows: s’ = (1, 3, 4, 6, 2, 5, 7).

According to the characteristics of fixed workstations for this problem, two workstation assignment schemes are designed.

(1) The first step is to allocate the tasks among workstations randomly. Subsequently, based on task order and priority relationships, the tasks are sequentially disassembled on each workstation. According to the sequence s’, seven tasks were assigned to three workstations with 2, 2, and 3 tasks, respectively. The workstation 1 is for tasks 1 and 3, while workstation 2 processes tasks 4 and 6. The remaining three tasks are disassembled on workstation 3, as depicted in Figure 3. The detailed data is reported in Table 4.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 3. The Gantt chart for the random workstation assignment scheme. CT: Cycle time.

Table 4

Task disassembly process for random workstation allocation scheme

WorkstationDisassembly sequenceRun timeIdle timeCTHazard coefficients
1346257
155 + 11161723.14
234710
323 + 25125
Actual disassembly time

(2) The tasks in a feasible sequence are assigned to the workstation list one by one. In one round, each workstation has only one task. A new round starts if there are remaining tasks in the sequence. Finally, all tasks can be assigned to all workstations. For example, according to the sequence s’ = (1, 3, 4, 6, 2, 5, 7), the total number of tasks is 7, and workstation number is 3. In the first round, task 1 is assigned to workstation 1, task 3 is assigned to workstation 2, and task 4 is assigned to workstation 3. In the next round, task 6 is assigned to workstation 1; the remaining tasks are assigned in the same way until all tasks are assigned. The final Gantt chart is depicted in Figure 4. The detailed data reported is in Table 5.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 4. The Gantt chart for the second workstation assignment scheme. CT: Cycle time.

Table 5

Task disassembly process for the second workstation allocation scheme

WorkstationDisassembly sequenceRun timeIdle timeCTHazard coefficients
1346257
15451431723.14
252 + 189
333 + 289
Actual disassembly time

To clearly demonstrate the performance improvement of meta-heuristics combined with Sarsa, an example of 8 tasks is solved. Sarsa’s reward feedback can be utilized to select the appropriate task assignment for the problem size and through better choosing local search to obtain higher quality solutions. From Figure 5 to Figure 7, it can be clearly seen that the results by the meta-heuristics with Sarsa strategies are better than the algorithm without Sarsa strategies.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 5. The Gantt chart for task 8. CT: Cycle time.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 6. The Gantt chart for task 8 applying only meta-heuristics. CT: Cycle time.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 7. The Gantt chart for task 8 applying the combination of metaheuristics with Sarsa. CT: Cycle time.

Meta-heuristics

This section describes four meta-heuristics: PSO, ABC, GA, and VNS. They start from parameter and population initialization. Then, the initial solutions are evaluated. After that, new solutions are generated by algorithm-specific strategies to update population. These steps are repeated until the termination condition is satisfied. The unified framework of the four algorithms is presented in Figure 8.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 8. The unified framework of the four algorithms.

Sarsa

As a reinforcement learning algorithm, Sarsa is introduced to improve the adaptability and performance of Q-learning[58]. The main difference between Sarsa and Q-learning is the strategy to update Q-value. The schematic representation of the Sarsa framework is depicted in Figure 9.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 9. The framework of Sarsa.

The updating formula of Q-value in Sarsa is expressed as Equation (17).

$$ Q(S_{t},A_{t})\longleftarrow Q(S_{t},A_{t})+\alpha [R+\gamma Q(S_{t+1},A_{t+1})-Q(S_{t},A_{t})] $$

where Q (St, At) is the Q-value of taking an action At at state St, α represents the learning rate, and R is the reward after executing action At. According to the problem of DLSP, the reward value formula is designed as Equation (18). γ is the discount factor, which takes values in the range [0,1].

$$ R=\begin{cases} (Target^{current}-Target^{new})/10,\quad Target^{new}< Target^{current}\\ 1 \quad \quad Target^{new}= Target^{current}\\ 0 \quad \quad Target^{new}> Target^{current} \end{cases} $$

Different from Q-learning, Sarsa can select an action under state St+1 according to different distributions of possible future returns for the current state, rather than directly selecting the largest expected Q-value to execute actions under the next state St+1. When updating the Q-table, Sarsa is still selecting the next action on the Q-table, which has not been updated yet. However, Q-learning chooses the next action on the updated Q-table.

For DLSP problems, we design a strategy π based on a probability distribution and set four values P = (0.7, 0.8, 0.9, 1), respectively. The action with the highest expected value is selected in the probability of P, and random selection is executed under the probability of (1 - P), as depicted in Equation (19).

$$ A_{t+1}\sim \pi (\cdot \mid S_{t+1}) $$

Sarsa based task assignment

Two workstation assignment schemes are presented in Section “Solution representation”. The Sarsa algorithm is used to select an appropriate workstation assignment scheme (action) during meta-heuristics’ iterations. The initial Q-table is shown in Table 6. Each solution corresponds to a state S, while each action A represents a workstation assignment scheme. If a solution is improved by executing a workstation allocation strategy, a reward is obtained and the corresponding Q-value is updated. The ratio of the workstation allocation scheme being selected for the next iteration is increased. Conversely, if a solution is not improved, there is no reward and its selection probability in the next iteration decreases. The π strategy is employed for action selection. Through continuous exploration and feedback in the learning process, the algorithm can discern better task allocation methods tailored to different requirements. Algorithm 1 describes the steps of this task assignment strategy.

Algorithm 1. Sarsa based task assignment
Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Table 6

Initial Q-table

a1a2a3$$ \dots $$an
Q (st, at) =s1000$$ \dots $$0
s2000$$ \dots $$0
s3000$$ \dots $$0
$$ \vdots $$$$ \vdots $$$$ \vdots $$$$ \vdots $$$$ \ddots $$
sn000$$ \dots $$0

Sarsa based local search

This article introduces ten local search operators for enhancing the performance of the meta-heuristics, which includes the adjustments for CT time and hazard coefficients. The detailed procedures of ten local searches are delineated below:
(1) Swap: Swap the positions of two random tasks in the sequence [Figure 10A].
(2) Double swap: Repeat the swap operation twice [Figure 10B].
(3) Insertion: Randomly take out a task from a sequence and insert it into a new position in the sequence [Figure 10C].
(4) Bind insertion: Randomly select two consecutive tasks from a task sequence and insert them at the position with the lowest desired target value [Figure 10D].
(5) Block insertion: Randomly select multiple consecutive tasks from a task sequence and insert them at the position with the lowest desired target value [Figure 10E].
(6) Insert sequentially: Randomly select multiple tasks in a task sequence and insert them into the sequence one by one with the lowest desired target value [Figure 10F].
(7) Inverse: Randomly select several consecutive tasks from a task sequence in their reverse order [Figure 10G].
(8) Sort: Adjust the hazard task to the next position that satisfies the priority relationship [Figure 10H].
(9) Random sort: Randomly adjust the hazardous task to the sequence position with satisfying the priority relationship [Figure 10I].
(10) Sort sequentially: Adjust the hazardous tasks in sequence to a new position with satisfying the priority relationship and the lowest target value [Figure 10J].

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 10. The framework of Sarsa.

Sarsa is employed to select premium local search operators during iterations. The solutions are as states while ten local search operators are taken as actions. In the learning process, the π strategy is used to select actions for the next state. If the current action gets a better solution, there is a position reward, increasing its probability of being selected in the next iteration. In contrast, the probability of being selected in the next iteration is decreased.

If the sequence obtained by local search does not satisfy the priority relationship, we need to update it to a feasible solution. The steps of Sarsa based local search are described in Algorithm 2.

Algorithm 2. Sarsa based local search
Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Framework of the proposed algorithms

The improved strategies are embedded in four meta-heuristics. The unified framework of them is shown in Figure 11. All algorithms start from an initial population and iteratively update population with their respective strategies. During iterations, two Sarsa based strategies are employed to improve the exploration and exploitation of meta-heuristics. Finally, the best result is output if the termination condition is met.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 11. The framework of the proposed algorithms.

RESULTS

Experimental setup

To verify the effectiveness of the proposed strategies, ten instances with different scales are solved[59]. All the algorithms are compiled in C++; the running platform is a desktop computer with an Intel Core i7-10,700 CPU @ 2.90 GHz and 16 GB of RAM under Microsoft Windows 11.

To ensure a fair comparison, all algorithms are executed for the same termination time, four seconds, while their parameters are experimentally determined to achieve the optimal combination. The algorithm’s performance is assessed by comparing its average value, minimum value, and the coefficient of variation (CV) in 20 runs. The smaller the CV value, the better the stability and robustness of the algorithm is.

The formula for calculating CV is as follows:

$$ CV=\frac{SD}{AVE}\times 100% $$

where AVE is the average of the target values obtained in 20 runs, and SD is the corresponding standard deviation.

Parameter setting

The orthogonal experiment design method is used to test the influence of parameters on the performance of the four combined algorithms. Taking the SPSO_SD as an example, there are four parameters, which are limited iterations (L), population size (Ps), learning rate (α), and discount rate (γ). Each parameter is set to four values, L ∈ {10, 20, 30, 40}, Ps ∈ {10, 20, 30, 40}, α ∈ {0.2, 0.4, 0.6, 0.8}, and γ ∈ {0.05, 0.1, 0.15, 0.2}. The orthogonal matrix L16 (44) is shown in Table 7. By solving each group of parameters in the SPSO_SD and analyzing the target values obtained from experimental results, a corresponding main effect diagram is constructed, as depicted in Figure 12. It is evident that the SPSO_SD has the best results under the parameter combination, L = 20, Ps = 20, α = 0.8, and γ = 0.1, which is adopted for further test and comparisons.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 12. The main effects plot for SPSO_SD.

Table 7

Orthogonal experiments for parameter setting

No.LPsαγTarget
110100.20.056,564
220100.40.107,525
330100.60.156,455
440100.80.205,433
510200.40.154,386
620200.20.202,765
730200.80.057,188
840200.60.108,576
910300.60.208,421
1020300.80.156,144
1130300.20.105,053
1240300.40.057,799
1310400.80.103,402
1420400.60.054,316
1530400.40.208,608
1640400.20.158,584

In ABC, there are three parameters, employed bees (Ep), onlooker bees (Op), and scout bees (Sp), Op = 1 - Ep - Sp. By the trend of the parameter hierarchy of Figure 13, we choose the combination Ep = 0.5, Op = 0.2, and Sp = 0.3. In GA, there are two important parameters: crossover probability and mutation probability. After parameter tuning experiments, we choose Pc = 0.7 and Pm = 0.3, as depicted in Figure 14. In VNS, the maximum number of iterations and the number of domain operations have an important impact on the performance, and experimentally, the algorithm performs best when L = 20 and M = 30, as depicted in Figure 15.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 13. Parameter level trend of ABC. ABC: Artificial bee colony.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 14. Parameter level trend of GA. GA: Genetic algorithm.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 15. Parameter level trend of VNS. VNS: Variable neighborhood search.

We design a strategy π based on probability distribution. To choose a better probability setting, the largest instance is solved under six probabilities (P = 0.5, 0.6, 0.7, 0.8, 0.9, 1) with 20 runs independently. The first 20 optimal values under six probabilities are selected to judge the number of distributions in each probability, as shown in Figure 16. The best four probability values are selected, (P = 0.7, 0.8, 0.9, 1).

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 16. The number proportion of the optimal values of six probabilities.

Verifying the effectiveness of improved strategies

To verify the effectiveness of the proposed strategies, four meta-heuristics are compared to their respective variants. For all cases, the average (Ave) and minimum (Min) values of results are reported in Tables 8-11. From Table 8, SPSO_SD has the best results, with the best results for all cases and the smallest mean values. As shown in Table 9, SABC_SD has the smallest mean values for all cases and gets the best results in seven examples. The SABC_RD gets the best results for one case, while ABC_SD gets the best results in four cases. As reported in Table 10, SGA_SD obtains the smallest mean values for all cases and achieves the best results for five cases. The SGA_RD achieves the best results for only 1 case, while the GA_SD gets the best results for five cases. In Table 11, SVNS_SD has the smallest mean values for all cases and achieves the best result for seven cases. The SVNS_RD achieves the best result for one case. The VNS_SD achieves the best result for two cases while the VNS does not obtain the best mean value and minimum value for any case. It can be concluded that the meta-heuristics combined with Sarsa based task assignment and local search strategies have better robustness and stability than the compared ones.

Table 8

Internal comparison of PSO algorithm

TasksPSOSPSO_RDPSO_SDSPSO_SD
AveMinAveMinAveMinAveMin
2596.7069.0048.6035.0066.6052.0047.6532.00
2768.9536.0058.3535.0067.5047.0048.9027.00
36130.7555.0096.5065.0082.0547.0079.9032.00
47406.95357.00368.30343.00235.20427.00358.85241.00
5119,936.9516,538.0015,067.1512,411.0015,790.508,716.0013,557.108,082.00
5512,709.254,439.008,992.806,892.0010,276.604,095.007,074.253,431.00
66185.7070.0079.1554.0073.3547.0059.0041.00
701,417.25619.00514.15443.001,104.15778.00406.65338.00
798,933.204,452.003,669.452,995.006,384.454,003.002,765.602,173.00
8342,163.3037,139.0025,487.1523,013.0038,521.5530,081.0023,561.0517,316.00
Table 9

Internal comparison of ABC algorithm

TasksABCSABC_RDABC_SDSABC_SD
AveMinAveMinAveMinAveMin
2578.9544.0077.0549.0067.4044.0050.3032.00
2765.3545.0061.2036.0064.5541.0046.3028.00
36114.5564.00106.1566.0088.8534.0078.0532.00
47325.65160.00400.85244.00314.40145.00252.85142.00
5120,681.8013,558.0019,207.4013,500.0019,485.6510,062.0015,820.6512,411.00
5513,492.054,383.008,385.503,431.009,209.255,088.008,056.403,431.00
66150.5582.00124.8550.00102.9543.0079.4543.00
701,456.90784.00780.95331.00824.40323.00502.40331.00
797,894.503,759.005,679.852,489.005,489.552,157.004,316.852,022.00
8339,651.2533,843.0033,654.6527,724.0036,514.0532,823.0030,152.2018,980.00
Table 10

Internal comparison of GA algorithm

TasksGASGA_RDGA_SDSGA_SD
AveMinAveMinAveMinAveMin
2599.2546.0097.7570.0078.2044.0076.4541.00
2779.5538.0081.8547.0056.2026.0056.5027.00
36126.1554.00125.4586.00115.0538.00109.6042.00
47430.50250.00512.45323.00408.10100.00355.45145.00
5121,068.858,514.0020,795.4014,013.0023,868.206,804.0018,098.508,158.00
5513,754.806,701.0011,878.708,340.0012,657.456,587.0010,375.503,770.00
66171.3075.00151.2545.00164.3552.0099.3545.00
701,542.10830.001,175.40345.001,056.60298.00869.10345.00
7910,334.105,658.007,799.502,922.007,925.754,057.005,816.802,658.00
8348,563.3542,158.0034,561.0532,401.0038,771.8536,486.0032,511.6026,868.00
Table 11

Internal comparison of VNS algorithm

TasksVNSSVNS_RDVNS_SDSVNS_SD
AveMinAveMinAveMinAveMin
25110.3578.0072.2544.0079.7554.0050.2032.00
2764.7538.0064.2039.0052.8526.0055.3529.00
36136.0591.0089.8564.00104.3563.0074.1032.00
47539.95361.00344.90234.00338.80206.00325.30159.00
5127,138.8019,890.0019,278.108,496.0017,015.3015,880.0014,763.858,082.00
5515,715.9510,327.007,821.257,182.0010,730.453,888.007,046.953,534.00
66167.95103.00125.2549.00130.8043.0063.1047.00
701,490.85867.00961.95333.00999.55418.00438.20311.00
7911,858.907,249.006,547.102,030.009,293.403,123.003,402.202,444.00
8334,658.3029,843.0029,021.3525,637.0036,524.5531,450.0027,563.3519,765.00

The CV is employed as a metric to assess the influence of enhancement strategies on algorithms’ stability and robustness. Tables 12-15 show the comparisons of CV among the four classical meta-heuristics and their variants, respectively. In Table 12, SPSO_SD receives the best Min values for eight out of ten cases. As shown in Table 13, the SABC_SD obtains the minimum values for eight cases, while the ABC_SD obtains the minimum values for two cases. As reported in Table 14, the SGA_SD obtains the minimum values for seven out of ten cases, while GA_SD gets the minimum values for two cases. In Table 15, SVNS_SD obtains the minimum values for eight cases, while the VNS_SD and SVNS_RD obtain the minimum value for one case. In each group, the algorithm with two Sarsa based strategies has smaller CV values for most instances. This means that the Sarsa based strategies can improve the stability and robustness of four meta-heuristics.

Table 12

Comparison of CV values for the PSO variant algorithm

No.TasksPSOPSO_SDSPSO_RDSPSO_SD
12519.69%3.67%6.64%7.50%
22710.87%7.32%4.45%3.89%
33618.14%9.90%10.86%3.96%
44710.90%6.31%5.09%4.96%
55117.89%8.99%6.76%5.42%
65519.36%4.44%5.42%6.96%
76612.75%8.59%9.38%2.61%
87114.54%9.20%7.68%2.42%
97915.19%2.39%8.01%1.04%
108319.08%8.83%11.41%7.32%
Table 13

Comparison of CV values for the ABC variant algorithm

No.TasksABCABC_SDSABC_RDSABC_SD
12512.47%9.87%9.00%3.48%
22715.86%5.24%8.77%2.65%
33615.25%4.51%3.82%2.73%
44718.74%5.39%9.59%6.76%
55117.69%5.47%7.33%6.84%
65518.41%8.99%8.79%8.79%
76616.51%8.28%7.33%6.96%
87118.64%6.12%6.52%8.08%
97911.42%7.70%8.84%6.67%
108318.76%12.47%16.29%7.76%
Table 14

Comparison of CV values for the GA variant algorithm

No.TasksGAGA_SDSGA_RDSGA_SD
12511.13%8.11%9.03%5.05%
22716.44%7.49%9.42%5.48%
33617.59%8.34%7.75%8.37%
44714.05%9.63%6.53%4.11%
55119.88%7.39%8.29%7.40%
65514.67%9.74%12.77%6.51%
76616.34%5.19%10.62%6.73%
87115.00%8.44%13.17%6.60%
97917.12%7.22%18.05%7.17%
108314.56%10.28%12.77%9.03%
Table 15

Comparison of CV values for the VNS variant algorithm

No.TasksVNSVNS_SDSVNS_RDSVNS_SD
12512.82%9.26%8.99%4.67%
22714.76%8.25%8.28%4.36%
33612.14%6.76%6.12%4.49%
44717.31%6.76%7.70%3.72%
55112.48%9.95%6.61%5.01%
65519.13%7.34%7.33%7.64%
76612.61%9.03%7.18%6.35%
87110.60%3.96%9.00%3.59%
97918.63%3.48%8.84%4.32%
108319.13%10.49%9.95%6.32%

Further comparisons and discussions

To further verify the performance difference among four algorithms with Sarsa based strategies, the Friedman test and Nemenyi post-test are executed. The results are represented in Table 16, Figures 17 and 18. In Table 16, the obtained asymptotic significance (Asymp.Sig) is far less than 0.05, indicating noteworthy disparities among the four algorithms. It implies that there exist significant variations in their performance. Figure 17 shows the mean rank of algorithms by Nemenyi post-test. It is obvious that the SPSO_SD has the smallest rank value (1.70). Figure 18 visually presents the algorithm ranking distribution for all cases. It can be seen from Figure 18 that the SPSO_SD ranks first for six out of ten instances and second for two cases. The SABC_SD and SVNS_SD have ranked first for 2 out of 10 instances, respectively. The SGA_SD has a bad rank for most instances. Hence, the SPSO_SD outperforms its peers in solving the concerned problems.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 17. The rank of Nemenyi post-hoc test.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 18. Friedman’s binary rank variance analysis.

Table 16

The results of Friedman test

N10
Chi-Square17.760
Df3
Asymp.Sig..000

Convergence analysis

In this section, we present curves depicting the convergence of four classical meta-heuristics. The results of cases with 47 tasks and 79 tasks are used, as shown in Figures 19 and 20. It is evident that the algorithms converge rapidly at the beginning and flatten out with the increasing computational time. SGA_GD algorithm gives the worst results. SVNS_SD has medium performance and more stability. SABC_SD fluctuates a lot during the convergence process. Compared with these algorithms, SPSO_SD has faster convergence and higher quality solutions. It can be observed that SPSO_SD has better efficiency and effectiveness than its peers.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 19. Convergence curves of all compared algorithms for task 47.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 20. Convergence curves of all compared algorithms for task 79.

Case study: aircraft engine disassembly

In the experimental part, the arithmetic instances include self-generated cases and some actual cases such as discarded cell phones for the case with 27 tasks, LCD TVs for the case with 36 tasks, laptop computers for the case with 47 tasks, and aircraft engines for the case with 51 tasks[59]. The case with 51 tasks is a specific case provided by a domestic airport. Figure 21 is the disassembly priority diagram of the aircraft engine disassembly case in our project.

Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients

Figure 21. Precedence graph of the aircraft engine disassembly case.

CONCLUSIONS AND FUTURE WORK

A mathematical model for DLSP is established, considering time interference and priority relationships. The main objective is to minimize both CT and hazard coefficients. According to the problem’s characteristics, a Sarsa algorithm with a reward feedback mechanism is employed in two stages for task allocation and guiding local search operators’ selection. The experiments solve ten instances of varying scales. Through analysis and discussion of the results, it is proven that the PSO algorithm combined with two Sarsa strategies exhibits strong competitiveness for solving DLSP problems.

In the future, the following directions will be addressed: (1) building upon existing challenges, consider more practical constraints, such as incomplete disassembly, multilateral disassembly, and other related disassembly issues; (2) try to combine more reinforcement learning algorithms to meta-heuristics for solve disassembly problems; (3) attempt to apply the combination of reinforcement learning and meta-heuristics to other scheduling problems[60-62].

DECLARATIONS

Authors’ contributions

Software, validation, conceptualization, methodology, writing- original draft preparation: Li D

Supervision, writing - review and editing, validation, formal analysis: Gao K

Methodology, investigation, instance collection: Ren Y, Fu Y

Data curation, investigation, methodology: Zhang R

Availability of data and materials

Not applicable.

Financial support and sponsorship

This study is partially supported by the National Natural Science Foundation of China under Grant 62173356, the Science and Technology Development Fund (FDCT), Macau SAR, under Grant 0019/2021/A, Zhuhai Industry-University-Research Project with Hongkong and Macao under Grant ZH22017002210014PWC, the Guangdong Basic and Applied Basic Research Foundation (2023A1515011531), Research on the Key Technologies for Scheduling and Optimization of Complex Distributed Manufacturing Systems (22JR10KA007).

Conflicts of interest

All authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

© The Author(s) 2024.

REFERENCES

1. Liu J, Wang S. Balancing disassembly line in product recovery to promote the coordinated development of economy and environment. Sustainability 2017;9:309.

2. Gungor A, Gupta SM. Issues in environmentally conscious manufacturing and product recovery: a survey. Comput Ind Eng 1999;36:811-53.

3. Özceylan E, Kalayci CB, Güngör A, Gupta SM. Disassembly line balancing problem: a review of the state of the art and future directions. Int J Prod Res 2019;57:4805-27.

4. Paterson DAP, Ijomah WL, Windmill JFC. End-of-life decision tool with emphasis on remanufacturing. J Clean Prod 2017;148:653-64.

5. Güngör A, Gupta SM. Disassembly line in product recovery. Int J Prod Res 2002;40:2569-89.

6. Fu Y, Zhou M, Guo X, Qi L, Sedraoui K. Multiverse optimization algorithm for stochastic biobjective disassembly sequence planning subject to operation failures. IEEE Trans Syst Man Cybern Syst 2022;52:1041-51.

7. Lambert AJD. Disassembly sequencing: a survey. Int J Prod Res 2003;41:3721-59.

8. Kalayci CB, Gupta SM. Artificial bee colony algorithm for solving sequence-dependent disassembly line balancing problem. Expert Syst Appl 2013;40:7231-41.

9. Edis EB. Constraint programming approaches to disassembly line balancing problem with sequencing decisions. Comput Oper Res 2021;126:105111.

10. ÇİL ZA. An exact solution method for multi-manned disassembly line design with AND/OR precedence relations. Appl Math Model 2021;99:785-803.

11. Meng L, Zhang B, Ren Y, Sang H, Gao K, Zhang C. Mathematical formulations for asynchronous parallel disassembly planning of end-of-life products. Mathematics 2022;10:3854.

12. Bentaha ML, Battaïa O, Dolgui A. An exact solution approach for disassembly line balancing problem under uncertainty of the task processing times. Int J Prod Res 2015;53:1807-18.

13. Altekin FT, Akkan C. Task-failure-driven rebalancing of disassembly lines. Int J Prod Res 2012;50:4955-76.

14. He J, Chu F, Dolgui A, Zheng F, Liu M. Integrated stochastic disassembly line balancing and planning problem with machine specificity. Int J Prod Res 2022;60:1688-708.

15. McGovern SM, Gupta SM. Uninformed and probabilistic distributed agent combinatorial searches for the unary NP-complete disassembly line balancing problem. In: Proceedings Volume Environmentally Conscious Manufacturing V. 2005.

16. Chen M, Tan Y. SF-FWA: a self-adaptive fast fireworks algorithm for effective large-scale optimization. Swarm Evol Comput 2023;80:101314.

17. Singh P, Pasha J, Moses R, Sobanjo J, Ozguven EE, Dulebenets MA. Development of exact and heuristic optimization methods for safety improvement projects at level crossings under conflicting objectives. Reliab Eng Syst Saf 2022;220:108296.

18. Pasha J, Nwodu AL, Fathollahi-fard AM, et al. Exact and metaheuristic algorithms for the vehicle routing problem with a factory-in-a-box in multi-objective settings. Adv Eng Inform 2022;52:101623.

19. Singh E, Pillay N. A study of ant-based pheromone spaces for generation constructive hyper-heuristics. Swarm Evol Comput 2022;72:101095.

20. Fu Y, Ma X, Gao K, Li Z, Dong H. Multi-objective home health care routing and scheduling with sharing service via a problem-specific knowledge-based artificial bee colony algorithm. IEEE Trans Intell Transport Syst 2023:1-14.

21. Fu Y, Tian G, Li Z, Wang Z. Parallel machine scheduling with dynamic resource allocation via a master-slave genetic algorithm. IEEJ Trans Electr Electron Eng 2018;13:748-56.

22. Fu Y, Wang H, Huang M, Wang J. A decomposition based multiobjective genetic algorithm with adaptive multipopulation strategy for flowshop scheduling problem. Nat Comput 2019;18:757-68.

23. Ma X, Fu Y, Gao K, Zhu L, Sadollah A. A multi-objective scheduling and routing problem for home health care services via brain storm optimization. Complex Syst Model Simul 2023;3:32-46.

24. Liang P, Fu Y, Ni S, Zheng B. Modeling and optimization for noise-aversion and energy-awareness disassembly sequence planning problems in reverse supply chain. Environ Sci Pollut Res Int 2021.

25. Wang Y, Han Y, Wang Y, Li J, Gao K, Liu Y. An effective two-stage iterated greedy algorithm for distributed flowshop group scheduling problem with setup time. Expert Syst Appl 2023;233:120909.

26. Wang Y, Han Y, Wang Y, Tasgetiren MF, Li J, Gao K. Intelligent optimization under the makespan constraint: rapid evaluation mechanisms based on the critical machine for the distributed flowshop group scheduling problem. Eur J Oper Res 2023;311:816-32.

27. Wang Y, Han Y, Wang Y, Pan Q, Wang L. Sustainable scheduling of distributed flow shop group: a collaborative multi-objective evolutionary algorithm driven by indicators. IEEE Trans Evol Comput 2023.

28. Chen S, Pan QK, Gao L, Sang H. A population-based iterated greedy algorithm to minimize total flowtime for the distributed blocking flowshop scheduling problem. Eng Appl Artif Intell 2021;104:104375.

29. Huang YY, Pan QK, Gao L, Miao ZH, Peng C. A two-phase evolutionary algorithm for multi-objective distributed assembly permutation flowshop scheduling problem. Swarm Evol Comput 2022;74:101128.

30. Wang ZY, Pan QK, Gao L, Wang YL. An effective two-stage iterated greedy algorithm to minimize total tardiness for the distributed flowshop group scheduling problem. Swarm Evol Comput 2022;74:101143.

31. Kalayci CB, Gupta SM. A particle swarm optimization algorithm with neighborhood-based mutation for sequence-dependent disassembly line balancing problem. Int J Adv Manuf Technol 2013;69:197-209.

32. Tseng YJ, Yu FY, Huang FY. A green assembly sequence planning model with a closed-loop assembly and disassembly sequence planning using a particle swarm optimization method. Int J Adv Manuf Technol 2011;57:1183-97.

33. Bouazza S, Hassine H, Barkallah M, Amari S, Haddar M. Disassembly sequence optimization for profit and energy consumption using petri nets and particle swarm optimization. In: Ben Amar M, Bouguecha A, Ghorbel E, El Mahi A, Chaari F, Haddar M, editors. Advances in materials, mechanics and manufacturing II. Cham: Springer International Publishing; 2022. pp. 267-76.

34. Yeh WC. Optimization of the disassembly sequencing problem on the basis of self-adaptive simplified swarm optimization. IEEE Trans Syst Man Cybern A 2012;42:250-61.

35. Yeh WC. Simplified swarm optimization in disassembly sequencing problems with learning effects. Comput Oper Res 2012;39:2168-77.

36. Li WD, Xia K, Gao L, Chao KM. Selective disassembly planning for waste electrical and electronic equipment with case studies on liquid crystaldisplays. Robot Comput Integr Manuf 2013;29:248-60.

37. Wang K, Li X, Gao L, Li P, Sutherland JW. A discrete artificial bee colony algorithm for multiobjective disassembly line balancing of end-of-life products. IEEE Trans Cybern 2022;52:7415-26.

38. Guo H, Zhang L, Ren Y, Li Y, Zhou Z, Wu J. Optimizing a stochastic disassembly line balancing problem with task failure via a hybrid variable neighborhood descent-artificial bee colony algorithm. Int J Prod Res 2023;61:2307-21.

39. Zhang L, Wu Y, Zhao X, et al. A multi-objective two-sided disassembly line balancing optimization based on artificial bee colony algorithm: a case study of an automotive engine. Int J Pr Eng Man GT 2022;9:1329-47.

40. Ren Y, Gao K, Fu Y, Sang H, Li D, Luo Z. A novel Q-learning based variable neighborhood iterative search algorithm for solving disassembly line scheduling problems. Swarm Evol Comput 2023;80:101338.

41. Slama I, Ben-Ammar O, Dolgui A, Masmoudi F. Genetic algorithm and Monte Carlo simulation for a stochastic capacitated disassembly lot-sizing problem under random lead times. Comput Ind Eng 2021;159:107468.

42. Wu T, Zhang Z, Zeng Y, Zhang Y. Mixed-integer programming model and hybrid local search genetic algorithm for human-robot collaborative disassembly line balancing problem. Int J Prod Res 2023;62:1758-82.

43. Tseng HE, Chang CC, Lee SC, Huang YM. A block-based genetic algorithm for disassembly sequence planning. Expert Syst Appl 2018;96:492-505.

44. Go TF, Wahab DA, Rahman MNA, Ramli R, Hussain A. Genetically optimised disassembly sequence for automotive component reuse. Expert Syst Appl 2012;39:5409-17.

45. Alshibli M, El Sayed A, Kongar E, Sobh TM, Gupta SM. Disassembly sequencing using tabu search. J Intell Robot Syst 2016;82:69-79.

46. Wang K, Li X, Gao L, Li P, Gupta SM. A genetic simulated annealing algorithm for parallel partial disassembly line balancing problem. Appl Soft Comput 2021;107:107404.

47. Kalayci CB, Polat O, Gupta SM. A variable neighbourhood search algorithm for disassembly lines. J Manuf Technol Manag 2015;26:182-94.

48. Ren Y, Zhang C, Zhao F, Triebe MJ, Meng L. An MCDM-based multiobjective general variable neighborhood search approach for disassembly line balancing problem. IEEE Trans Syst Man Cybern Syst 2020;50:3770-83.

49. Akbay MA, Kalayci CB, Polat O. A parallel variable neighborhood search algorithm with quadratic programming for cardinality constrained portfolio optimization. Knowl Based Syst 2020;198:105944.

50. Liu H, Zhang L. Optimizing a disassembly sequence planning with success rates of disassembly operations via a variable neighborhood search algorithm. IEEE Access 2021;9:157540-9.

51. Tuncel E, Zeid A, Kamarthi S. Solving large scale disassembly line balancing problem with uncertainty using reinforcement learning. J Intell Manuf 2014;25:647-59.

52. Zhao X, Li C, Tang Y, Cui J. Reinforcement learning-based selective disassembly sequence planning for the end-of-life products with structure uncertainty. IEEE Robot Autom Lett 2021;6:7807-14.

53. Zhao F, Hu X, Wang L, Zhao J, Tang J, Jonrinaldi. A reinforcement learning brain storm optimization algorithm (BSO) with learning mechanism. Knowl Based Syst 2022;235:107645.

54. Karimi-Mamaghan M, Mohammadi M, Pasdeloup B, Meyer P. Learning to select operators in meta-heuristics: an integration of Q-learning into the iterated greedy algorithm for the permutation flowshop scheduling problem. Eur J Oper Res 2023;304:1296-330.

55. Wang L, Gao K, Lin Z, Huang W, Suganthan PN. Problem feature based meta-heuristics with Q-learning for solving urban traffic light scheduling problems. Appl Soft Comput 2023;147:110714.

56. Lin Z, Gao K, Wu N, Suganthan PN. Scheduling eight-phase urban traffic light problems via ensemble meta-heuristics and Q-learning based local search. IEEE Trans Intell Transport Syst 2023;24:14415-26.

57. Yu H, Gao KZ, Ma ZF, Pan YX. Improved meta-heuristics with Q-learning for solving distributed assembly permutation flowshop scheduling problems. Swarm Evol Comput 2023;80:101335.

58. Rummery GA, Niranjan M. On-line Q-learning using connectionist systems. Available from: https://www.researchgate.net/publication/2500611_On-Line_Q-Learning_Using_Connectionist_Systems. [Last accessed on 24 Jan 2024].

59. Li D, Gao K, Ren Y, Zhang R, Fu Y. The supplementary file of the paper “Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients”. Available from: https://www.researchgate.net/publication/377446410_The_supplementary_file_of_the_paper_Integrating_Metaheuristics_and_a_Sarsa_Algorithm_for_Disassembly_Scheduling_Problems_with_Cycle_Time_and_Hazard_Coefficients. [Last accessed on 24 Jan 2024].

60. Li H, Gao K, Duan PY, Li JQ, Zhang L. An improved artificial bee colony algorithm with Q-learning for solving permutation flow-shop scheduling problems. IEEE Trans Syst Man Cybern Syst 2023;53:2684-93.

61. Zhang Z, Liu H, Zhou M, Wang J. Solving dynamic traveling salesman problems with deep reinforcement learning. IEEE Trans Neural Netw Learn Syst 2023;34:2119-32.

62. Zhao F, Di S, Wang L. A hyperheuristic with Q-learning for the multiobjective energy-efficient distributed blocking flow shop scheduling problem. IEEE Trans Cybern 2023;53:3337-50.

Cite This Article

Export citation file: BibTeX | RIS

OAE Style

Li D, Gao K, Ren Y, Zhang R, Fu Y. Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients. Green Manuf Open 2024;2:2. http://dx.doi.org/10.20517/gmo.2023.091901

AMA Style

Li D, Gao K, Ren Y, Zhang R, Fu Y. Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients. Green Manufacturing Open. 2024; 2(1): 2. http://dx.doi.org/10.20517/gmo.2023.091901

Chicago/Turabian Style

Li, Dachao, Kaizhou Gao, Yaxian Ren, Ruixue Zhang, Yaping Fu. 2024. "Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients" Green Manufacturing Open. 2, no.1: 2. http://dx.doi.org/10.20517/gmo.2023.091901

ACS Style

Li, D.; Gao K.; Ren Y.; Zhang R.; Fu Y. Integrating meta-heuristics and a Sarsa algorithm for disassembly scheduling problems with cycle time and hazard coefficients. Green. Manuf. Open. 2024, 2, 2. http://dx.doi.org/10.20517/gmo.2023.091901

About This Article

Special Issue

© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views
268
Downloads
63
Citations
0
Comments
0
3

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at support@oaepublish.com.

0
Download PDF
Cite This Article 2 clicks
Like This Article 3 likes
Share This Article
Scan the QR code for reading!
See Updates
Contents
Figures
Related
Green Manufacturing Open
ISSN 2835-7590 (Online)
Follow Us

Portico

All published articles will preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles will preserved here permanently:

https://www.portico.org/publishers/oae/