GASP-D3QN: Geometry-Aware Safety-Prioritized Dueling Double Q-Network for Online UAV Path Planning

Authors

  • Kexiao Wu North China Electric Power University
  • Bingyu Yang Guizhou University
  • Mingshen Xu North China Electric Power University

DOI:

https://doi.org/10.62177/jaet.v3i2.1272

Keywords:

UAV Path Planning, Deep Reinforcement Learning, Dueling Double Q-Network, Safety Shield, Geometry-Aware Representation, Online Decision Making

Abstract

Online three-dimensional path planning for unmanned aerial vehicles (UAVs) in cluttered environments remains challenging because decision-making must be performed under partial geometric observation, dynamic uncertainty, and strict safety constraints. This paper presents GASP-D3QN, a value-based reinforcement learning framework that integrates a geometry-aware hybrid encoder, a hard safety shield, a prior-guided action selector, and a dueling double-Q backbone with prioritized replay. The proposed design explicitly separates global kinematic cues from a local occupancy cube and introduces task priors related to goal progress, clearance, energy cost, and heading stability. To ensure an application-consistent evaluation, the comparison includes both learning-based online baselines and a classical reference method under the same observation and action interface. On the standard benchmark, GASP-D3QN achieves the best overall performance among the learning-based methods, with a success rate of 0.50 and an average return of 383.19, while maintaining competitive energy consumption. Additional experiments under denser obstacles and out-of-distribution wind disturbances preserve the same advantage over the learning-based baselines. Ablation studies further show that the geometry-aware encoder, safety shield, and action prior each contribute materially to the final result. These findings indicate that explicit geometric modeling and safety-aware action filtering provide an effective and practical recipe for learned online UAV navigation.

Downloads

Download data is not yet available.

References

Dubins, L. E. (1957). On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. American Journal of Mathematics, 79(3), 497–516.

Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107.

Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. The International Journal of Robotics Research, 5(1), 90–98.

Stentz, A. (1994). Optimal and efficient path planning for partially-known environments. Proceedings of the IEEE International Conference on Robotics and Automation, 4, 3310–3317.

LaValle, S. M., & Kuffner Jr., J. J. (2001). Randomized kinodynamic planning. The International Journal of Robotics Research, 20(5), 378–400.

Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for optimal motion planning. The International Journal of Robotics Research, 30(7), 846–894.

Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of the IEEE International Conference on Neural Networks, 4, 1942–1948.

Dorigo, M., & Gambardella, L. M. (1997). Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1), 53–66.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.

van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), 2094–2100.

Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. Proceedings of the 33rd International Conference on Machine Learning, 48, 1995–2003.

Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. Proceedings of the International Conference on Learning Representations.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning, 80, 1861–1870.

Garcia, J., & Fernandez, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(42), 1437–1480.

Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. Proceedings of the 34th International Conference on Machine Learning, 70, 22–31.

Krishnan, S., Boroujerdian, B., Fu, W., Faust, A., & Reddi, V. J. (2021). Air learning: A deep reinforcement learning gym for autonomous aerial robot visual navigation. Machine Learning, 110, 2501–2540.

Puente-Castro, A., Rivero, D., Pazos, A., & Fernandez-Blanco, E. (2022). UAV swarm path planning with reinforcement learning for field prospecting. Applied Intelligence, 52, 14101–14118.

Jiang, Y., Xu, X.-X., Zheng, M.-Y., & Zhan, Z.-H. (2024). Evolutionary computation for unmanned aerial vehicle path planning: A survey. Artificial Intelligence Review, 57, Article 267.

Zhang, D., Xuan, Z., Zhang, Y., Yao, J., Li, X., & Li, X. (2023). Path planning of unmanned aerial vehicle in complex environments based on state-detection twin delayed deep deterministic policy gradient. Machines, 11(1), Article 108.

Liu, J., Luo, W., Zhang, G., & Li, R. (2025). Unmanned aerial vehicle path planning in complex dynamic environments based on deep reinforcement learning. Machines, 13(2), Article 162.

Downloads

How to Cite

Wu, K. ., Yang, B., & Xu, M. (2026). GASP-D3QN: Geometry-Aware Safety-Prioritized Dueling Double Q-Network for Online UAV Path Planning. Journal of Advances in Engineering and Technology, 3(2). https://doi.org/10.62177/jaet.v3i2.1272

Issue

Section

Articles

DATE

Received: 2026-04-09
Accepted: 2026-04-13
Published: 2026-04-20