Article Contents

Journal of Southwest Jiaotong University> 2026>

ZHAO Duo, XIE Guanhao, WANG Yewen, ZHAO Wenjie, HUANG Chen, YUAN Zhaohui. Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm[J]. Journal of Southwest Jiaotong University. doi: 10.3969/j.issn.0258-2724.20240085

Citation:

ZHAO Duo, XIE Guanhao, WANG Yewen, ZHAO Wenjie, HUANG Chen, YUAN Zhaohui. Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm[J].Journal of Southwest Jiaotong University.doi:10.3969/j.issn.0258-2724.20240085

Citation:

ZHAO Duo, XIE Guanhao, WANG Yewen, ZHAO Wenjie, HUANG Chen, YUAN Zhaohui. Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm[J].Journal of Southwest Jiaotong University.doi:10.3969/j.issn.0258-2724.20240085

PDF( 5378 KB)

Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm

doi:10.3969/j.issn.0258-2724.20240085

School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611731, China

Received Date:23 Feb 2024
Rev Recd Date:30 Apr 2024

Available Online:07 Nov 2025

Abstract

Abstract

To meet the needs of human-robot collaboration, where an inspection manipulator actively cooperates with a person under the railroad car and to enhance the convergence speed of the proximal policy optimization (PPO) algorithm, an adaptive PPO (a-PPO) algorithm was proposed and innovatively applied in the online motion planning of the inspection manipulator. Firstly, the system model was designed to immediately output policy actions based on the current environmental state. Secondly, geometric reinforcement learning was introduced to construct the reward function, utilizing the agent’s exploration to continuously optimize the distribution of rewards. Thirdly, the clipping value was adaptively determined based on the policy similarity between before and after the update, and the a-PPO algorithm was developed. Finally, the improvement effects of the a-PPO algorithm were compared on two-dimensional maps, and the feasibility and effectiveness of its application were experimentally verified in both simulation and real train scenarios. The results indicate that in the two-dimensional plane simulation, the a-PPO algorithm shows certain advantages in convergence speed compared to other PPO algorithms. Additionally, the stability of paths has been improved, with the average length standard deviation being 16.786% lower than that of the PPO algorithm and 66.179% lower than that of the Informed-RRT* algorithm. In the application experiments in both simulated and real train scenarios, the manipulator demonstrates the capability to dynamically adjust target points and actively avoid dynamic obstacles during motion, reflecting its adaptability to dynamic environments.
- reinforcement learning,
- deep learning,
- motion planning,
- manipulator,
- railroad car

FullText(HTML)

References (25)

References

[1]	JING G Q, QIN X Y, WANG H Y, et al. Developments, challenges, and perspectives of railway inspection robots[J]. Automation in Construction, 2022, 138: 104242. doi:10.1016/j.autcon.2022.104242
[2]	OLLERO A, TOGNON M, SUAREZ A, et al. Past, present, and future of aerial robotic manipulators[J]. IEEE Transactions on Robotics, 2022, 38(1): 626-645. doi:10.1109/TRO.2021.3084395
[3]	江海凡, 丁国富, 肖通, 等. 数字孪生演进模型及其在智能制造中的应用[J]. 江南娱乐网页版入口官网下载安装学报, 2022, 57(6): 1386-1394. JIANG Haifan, DING Guofu, XIAO Tong, et al. Digital twin evolution model and its applications in intelligent manufacturing[J]. Journal of Southwest Jiaotong University, 2022, 57(6): 1386-1394.
[4]	GUALTIERI L, RAUCH E, VIDONI R. Emerging research fields in safety and ergonomics in industrial collaborative robotics: a systematic literature review[J]. Robotics and Computer-Integrated Manufacturing, 2021, 67: 101998.1-101998.30.
[5]	OMISORE O M, HAN S P, XIONG J, et al. A review on flexible robotic systems for minimally invasive surgery[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(1): 631-644. doi:10.1109/TSMC.2020.3026174
[6]	窦汝桐, 于慎波, 孙凤, 等. 7自由度仿人机械臂工作空间求解的降密蒙特卡洛法[J]. 江南娱乐网页版入口官网下载安装学报, 2023, 58(6): 1328-1338. DOU Rutong, YU Shenbo, SUN Feng, et al. Density-reducing Monte Carlo method for 7 degrees of freedom humanoid robot arm workspace solution[J]. Journal of Southwest Jiaotong University, 2023, 58(6): 1328-1338.
[7]	BILAL H, YIN B Q, KUMAR A, et al. Jerk-bounded trajectory planning for rotary flexible joint manipulator: an experimental approach[J]. Soft Computing, 2023, 27(7): 4029-4039. doi:10.1007/s00500-023-07923-5
[8]	CHENG X, ZHOU J M, ZHOU Z, et al. An improved RRT-Connect path planning algorithm of robotic arm for automatic sampling of exhaust emission detection in Industry 4.0[J]. Journal of Industrial Information Integration, 2023, 33: 100436.1-100436.13.
[9]	YU X L, DONG M S, YIN W M. Time-optimal trajectory planning of manipulator with simultaneously searching the optimal path[J]. Computer Communications, 2022, 181: 446-453. doi:10.1016/j.comcom.2021.10.005
[10]	黄文东. 基于ROS的机械臂路径规划算法研究[D]. 成都: 江南娱乐网页版入口官网下载安装, 2022.
[11]	XIE G H, ZHAO D, TANG Q C, et al. Path planning for robotic arm based on reinforcement learning under the train[C]//2023 IEEE International Conference on Robotics and Biomimetics (ROBIO). Koh Samui: IEEE, 2023: 1-8.
[12]	WANG M, FU W J, HE X N, et al. A survey on large-scale machine learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(6): 2574-2594.
[13]	WANG J J, MA Y L, ZHANG L B, et al. Deep learning for smart manufacturing: methods and applications[J]. Journal of Manufacturing Systems, 2018, 48: 144-156. doi:10.1016/j.jmsy.2018.01.003
[14]	LE N, RATHOUR V S, YAMAZAKI K, et al. Deep reinforcement learning in computer vision: a comprehensive survey[J]. Artificial Intelligence Review, 2022, 55(4): 2733-2819. doi:10.1007/s10462-021-10061-9
[15]	JAMES S, MA Z C, ARROJO D R, et al. RLBench: the robot learning benchmark & learning environment[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 3019-3026. doi:10.1109/LRA.2020.2974707
[16]	TALAEI KHOEI T, OULD SLIMANE H, KAABOUCH N. Deep learning: systematic review, models, challenges, and research directions[J]. Neural Computing and Applications, 2023, 35(31): 23103-23124. doi:10.1007/s00521-023-08957-4
[17]	SHAKYA A K, PILLAI G, CHAKRABARTY S. Reinforcement learning algorithms: a brief survey[J]. Expert Systems with Applications, 2023, 231: 120495. doi:10.1016/j.eswa.2023.120495
[18]	LI Y K, HAO X L, SHE Y C, et al. Constrained motion planning of free-float dual-arm space manipulator viadeep reinforcement learning[J]. Aerospace Science and Technology, 2021, 109: 106446.1-106446.13.
[19]	KIRAN B R, SOBH I, TALPAERT V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(6): 4909-4926. doi:10.1109/TITS.2021.3054625
[20]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. [2017-8-28]. https://arxiv.org/abs/1707.06347.
[21]	SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 31 ^stInternational Conference on Machine Learning. Lille: PMLR, 2015: 1889-1897.
[22]	WEN S H, WEN Z T, ZHANG D, et al. A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning[J]. Applied Soft Computing, 2021, 110: 107605. doi:10.1016/j.asoc.2021.107605
[23]	ZHANG B C, MAO Z L, LIU W Q, et al. Geometric reinforcement learning for path planning of UAVs[J]. Journal of Intelligent & Robotic Systems, 2015, 77(2): 391-409.
[24]	MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]// Proceedings of the 33 ^rdInternational conference on machine learning. New York: PMLR, 2016: 1928-1937.
[25]	FUJINMOTO S, VAN HOOF H, MEGER, D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35 ^thInternational Conference on Machine Learning. Stockholm: PMLR, 2018: 1587-1596.

Relative Articles

Supplements (0)

Cited By

Proportional views

Proportional views

通讯作者:陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(13)/Tables(2)

Get Citation

PDF

XML

Article views(33) PDF downloads(6)

Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm

doi:10.3969/j.issn.0258-2724.20240085

Abstract

References

Proportional views

Catalog

通讯作者:陈斌, bchen63@163.com

Proportional views

Related

Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm

doi:10.3969/j.issn.0258-2724.20240085

Abstract

References

Proportional views

Catalog

通讯作者:陈斌, bchen63@163.com

Proportional views

Related

Export File

Citation

Format

Content