• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus
  • Indexed by Core Journals of China, Chinese S&T Journal Citation Reports
  • Chinese S&T Journal Citation Reports
  • Chinese Science Citation Database
Turn off MathJax
Article Contents
ZHAO Duo, XIE Guanhao, WANG Yewen, ZHAO Wenjie, HUANG Chen, YUAN Zhaohui. Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm[J]. Journal of Southwest Jiaotong University. doi: 10.3969/j.issn.0258-2724.20240085
Citation: ZHAO Duo, XIE Guanhao, WANG Yewen, ZHAO Wenjie, HUANG Chen, YUAN Zhaohui. Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm[J].Journal of Southwest Jiaotong University.doi:10.3969/j.issn.0258-2724.20240085

Online Motion Planning for Inspection Manipulator Based on Adaptive Proximal Policy Optimization Algorithm

doi:10.3969/j.issn.0258-2724.20240085
  • Received Date:23 Feb 2024
  • Rev Recd Date:30 Apr 2024
  • Available Online:07 Nov 2025
  • To meet the needs of human-robot collaboration, where an inspection manipulator actively cooperates with a person under the railroad car and to enhance the convergence speed of the proximal policy optimization (PPO) algorithm, an adaptive PPO (a-PPO) algorithm was proposed and innovatively applied in the online motion planning of the inspection manipulator. Firstly, the system model was designed to immediately output policy actions based on the current environmental state. Secondly, geometric reinforcement learning was introduced to construct the reward function, utilizing the agent’s exploration to continuously optimize the distribution of rewards. Thirdly, the clipping value was adaptively determined based on the policy similarity between before and after the update, and the a-PPO algorithm was developed. Finally, the improvement effects of the a-PPO algorithm were compared on two-dimensional maps, and the feasibility and effectiveness of its application were experimentally verified in both simulation and real train scenarios. The results indicate that in the two-dimensional plane simulation, the a-PPO algorithm shows certain advantages in convergence speed compared to other PPO algorithms. Additionally, the stability of paths has been improved, with the average length standard deviation being 16.786% lower than that of the PPO algorithm and 66.179% lower than that of the Informed-RRT* algorithm. In the application experiments in both simulated and real train scenarios, the manipulator demonstrates the capability to dynamically adjust target points and actively avoid dynamic obstacles during motion, reflecting its adaptability to dynamic environments.

  • loading
  • [1]
    JING G Q, QIN X Y, WANG H Y, et al. Developments, challenges, and perspectives of railway inspection robots[J]. Automation in Construction, 2022, 138: 104242. doi:10.1016/j.autcon.2022.104242
    [2]
    OLLERO A, TOGNON M, SUAREZ A, et al. Past, present, and future of aerial robotic manipulators[J]. IEEE Transactions on Robotics, 2022, 38(1): 626-645. doi:10.1109/TRO.2021.3084395
    [3]
    江海凡, 丁国富, 肖通, 等. 数字孪生演进模型及其在智能制造中的应用[J]. 江南娱乐网页版入口官网下载安装学报, 2022, 57(6): 1386-1394.

    JIANG Haifan, DING Guofu, XIAO Tong, et al. Digital twin evolution model and its applications in intelligent manufacturing[J]. Journal of Southwest Jiaotong University, 2022, 57(6): 1386-1394.
    [4]
    GUALTIERI L, RAUCH E, VIDONI R. Emerging research fields in safety and ergonomics in industrial collaborative robotics: a systematic literature review[J]. Robotics and Computer-Integrated Manufacturing, 2021, 67: 101998.1-101998.30.
    [5]
    OMISORE O M, HAN S P, XIONG J, et al. A review on flexible robotic systems for minimally invasive surgery[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(1): 631-644. doi:10.1109/TSMC.2020.3026174
    [6]
    窦汝桐, 于慎波, 孙凤, 等. 7自由度仿人机械臂工作空间求解的降密蒙特卡洛法[J]. 江南娱乐网页版入口官网下载安装学报, 2023, 58(6): 1328-1338.

    DOU Rutong, YU Shenbo, SUN Feng, et al. Density-reducing Monte Carlo method for 7 degrees of freedom humanoid robot arm workspace solution[J]. Journal of Southwest Jiaotong University, 2023, 58(6): 1328-1338.
    [7]
    BILAL H, YIN B Q, KUMAR A, et al. Jerk-bounded trajectory planning for rotary flexible joint manipulator: an experimental approach[J]. Soft Computing, 2023, 27(7): 4029-4039. doi:10.1007/s00500-023-07923-5
    [8]
    CHENG X, ZHOU J M, ZHOU Z, et al. An improved RRT-Connect path planning algorithm of robotic arm for automatic sampling of exhaust emission detection in Industry 4.0[J]. Journal of Industrial Information Integration, 2023, 33: 100436.1-100436.13.
    [9]
    YU X L, DONG M S, YIN W M. Time-optimal trajectory planning of manipulator with simultaneously searching the optimal path[J]. Computer Communications, 2022, 181: 446-453. doi:10.1016/j.comcom.2021.10.005
    [10]
    黄文东. 基于ROS的机械臂路径规划算法研究[D]. 成都: 江南娱乐网页版入口官网下载安装, 2022.
    [11]
    XIE G H, ZHAO D, TANG Q C, et al. Path planning for robotic arm based on reinforcement learning under the train[C]//2023 IEEE International Conference on Robotics and Biomimetics (ROBIO). Koh Samui: IEEE, 2023: 1-8.
    [12]
    WANG M, FU W J, HE X N, et al. A survey on large-scale machine learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(6): 2574-2594.
    [13]
    WANG J J, MA Y L, ZHANG L B, et al. Deep learning for smart manufacturing: methods and applications[J]. Journal of Manufacturing Systems, 2018, 48: 144-156. doi:10.1016/j.jmsy.2018.01.003
    [14]
    LE N, RATHOUR V S, YAMAZAKI K, et al. Deep reinforcement learning in computer vision: a comprehensive survey[J]. Artificial Intelligence Review, 2022, 55(4): 2733-2819. doi:10.1007/s10462-021-10061-9
    [15]
    JAMES S, MA Z C, ARROJO D R, et al. RLBench: the robot learning benchmark & learning environment[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 3019-3026. doi:10.1109/LRA.2020.2974707
    [16]
    TALAEI KHOEI T, OULD SLIMANE H, KAABOUCH N. Deep learning: systematic review, models, challenges, and research directions[J]. Neural Computing and Applications, 2023, 35(31): 23103-23124. doi:10.1007/s00521-023-08957-4
    [17]
    SHAKYA A K, PILLAI G, CHAKRABARTY S. Reinforcement learning algorithms: a brief survey[J]. Expert Systems with Applications, 2023, 231: 120495. doi:10.1016/j.eswa.2023.120495
    [18]
    LI Y K, HAO X L, SHE Y C, et al. Constrained motion planning of free-float dual-arm space manipulator viadeep reinforcement learning[J]. Aerospace Science and Technology, 2021, 109: 106446.1-106446.13.
    [19]
    KIRAN B R, SOBH I, TALPAERT V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(6): 4909-4926. doi:10.1109/TITS.2021.3054625
    [20]
    SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. [2017-8-28]. https://arxiv.org/abs/1707.06347.
    [21]
    SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 31 stInternational Conference on Machine Learning. Lille: PMLR, 2015: 1889-1897.
    [22]
    WEN S H, WEN Z T, ZHANG D, et al. A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning[J]. Applied Soft Computing, 2021, 110: 107605. doi:10.1016/j.asoc.2021.107605
    [23]
    ZHANG B C, MAO Z L, LIU W Q, et al. Geometric reinforcement learning for path planning of UAVs[J]. Journal of Intelligent & Robotic Systems, 2015, 77(2): 391-409.
    [24]
    MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]// Proceedings of the 33 rdInternational conference on machine learning. New York: PMLR, 2016: 1928-1937.
    [25]
    FUJINMOTO S, VAN HOOF H, MEGER, D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35 thInternational Conference on Machine Learning. Stockholm: PMLR, 2018: 1587-1596.
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(13)/Tables(2)

    Article views(33) PDF downloads(6) Cited by()
    Proportional views
    Related

    /

    Return
    Return
      Baidu
      map