Approaches to Creating Multiagent Systems and Deep Reinforcement Learning of Drones
DOI:
https://doi.org/10.15407/intechsys.2025.03.030Keywords:
unmanned moving objects, UAVs, UAV swarm control, swarm of UAVs, deep reinforcement learning, DRL, world models, world models introduces a model-based approach to RL, training paradigms execution shemeAbstract
Introduction. Unmanned aerial vehicles (UAVs) are increasingly used in many complex and diverse tasks related to civil and military spheres. UAVs are a class of aircraft, commonly referred to as drones. They can fly without the presence of a human pilot on board. However, there are a number of unsolved problems with UAVs development: flight path planning, navigation and control. In complex systems, which certainly include UAVs, artificial intelligence (AI) is usually used to solve these problems and ensure the required of its functioning, implemented by the method of deep learning with reinforcement. Modern foreign experience in the use of analytical platforms for controlling mobile objects, in particular UAVs, allows for the use of deep neural networks for the above tasks.
The purpose of the paper is to introduce domain experts whose primary job function is outside of machine learning to the challenges of applying AI to these problems, robust and complex deep neural networks and their training, which remains challenging and requires large amounts of data and practical experience. This can be a form of citizen science and will contribute to the replication of research and the democratization of AI.
Results. An analysis of solutions to these problems using deep reinforcement learning is performed, in particular, control of a swarm of UAVs etc. and a taxonomy of Model-Free deep reinforcement learning algorithms applied in UAV tasks is given. The first experience of solutions using the environment model is considered. Unfortunately, almost all works are of a nature, they lack verification in real or close to them environmental conditions. This paper presents a brief overview of approaches to solving problems of reinforcement learning - interactions between agents and the environment in the process of step-by-step decision making. This approach is applied to solving problems of moving objects and complex and partially observable environments; model-free and model-based learning; mathematical formalization of solving UAV problems under reinforcement learning, including paradigms for learning agents in a multi-agent environment Multi-Agent reinforcement learning. Problems arising in the multi-agent field, such as non-stationarity of the environment from the point of view of a single agent, relative overgeneralization and the problem of assigning credits are discussed. Formal concepts underlying these Multi-Agent reinforcement learning are presented.
Conclusions. An overview of methods for solving the problem of reinforced learning is presented, as a result of which the authors conclude that further research should focus on solutions using a model (Model-Based) and pay attention to the design of typical environments that meet certain conditions for performing the task. Such models may be easier and faster to adapt to the real environment.
References
Azar AT, Koubaa A., Ali Mohamed N. et al. Drone Deep Reinforcement Learning: A Review. Electronics 2021, Vol. 10(9), 1–30. https://doi.org/10.3390/electronics10090999
Oursatyev O.A, Volkov, O.Ye, Tkalya V.H. Automated Machine Learning. State and Prospects Development. Information Technologies and Systems, Vol. 2(2), 3–33. [In Ukrainian: Автоматизоване машинне навчання. Стан та перспективи розвитку. Information Technologies and Systems (Інформаційні технології та системи)] https://doi.org/10.15407//intechsys.2025.02.003
Schmidhuber J. Deep Learning in Neural Networks: An Overview. Neural Networks, 2015, Vol. 61, 85-117. https://doi.org/10.1016/j.neunet.2014.09.003
Sutton, R.S., Barto A.G. Reinforcement Learning: An Introduction. MIT Press,Cambridge, Massachusetts, London, England, 2018, 526 p. URL: https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
Schmidhuber J. Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments. IDSIA, 1990. URL: https://people.idsia.ch/~juergen/FKI-126-90_(revised)bw_ocr.pdf
Schmidhuber J. Reinforcement Learning in Markovian and Non-Markovian Environments. IDSIA, 1991. URL: https://sferics.idsia.ch/pub/juergen/nipsnonmarkov.pdf
Arulkumaran K. et al. Deep reinforcement learning: A brief survey, 2017. https://doi.org/10.1109/MSP.2017.2743240
Zhang H., Yu T. Taxonomy of Reinforcement Learning Algorithms. Deep Reinforcement Learning, Springer Singapore, 2020. https://doi.org/10.1007/978-981-15-4095-0_3
Schmidhuber J. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. IJCNN International Joint Conference on Neural Networks, 1990, Vol. 2, 253-258. https://doi.org/10.1109/IJCNN.1990.137723
Nagabandi A. et al. PDDM: Planing with Deep Dynamics Models for Learning Dexterous Manipulation. Conference on Robot Learning (CoRL), 2019. URL: https://sites.google.com/view/pddm
Poole D.L., Mackworth A.K. AI Foundations of Computational Agents. University of British Columbia, TJ Books Limited, Padstow, Cornwall, 2023, 870 p. https://doi.org/10.1017/9781009258227
Francois-Lavet V., Henderson P. et al. An Introduction to Deep Reinforcement Learning. 2018. https://doi.org/10.1561/9781680835397
Ha D., Schmidhuber J. World Models. Can agents learn inside of their own dreams? NIPS 2018, March 27 2018, Oral Presentation. https://doi.org/10.5281/zenodo.1207631
Anastasis Germanidis. Introducing General World Models. URL: https://research.runwayml.com/introducing-general-world-models [Accessed Dec. 2023]
Gronauer S., Diepold K. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 2021, 1–49. URL: https://link.springer.com/article/10.1007/s10462-021-09996-w
Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction. A Bradford Book, The MIT Press Cambridge, Massachusetts, London, England,2015, 1–337. URL: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
Littman M.L. Markov games as a framework for multi-agent reinforcement learning. Machine Learning Proceedings, 11-th International Conference, Rutgers University, New Brunswick, NJ, 10–13 Jul. 1994, 157-163. https://doi.org/10.1016/B978-1-55860-335-6.50027-1
Fully Observable vs. Partially Observable Environment in AI. URL: https://www.geeksforgeeks.org/fully-observable-vs-partially-observable-environment-in-ai/[Accessed May. 2024]
Tožicka J., Szulyovszky B., de Chambrier G. et all. Application of deep ˇreinforcement learning to UAV fleet control. SAI Intelligent Systems Conference, London, UK, 5–6 Sept. 2018, 1169–1177. https://doi.org/10.1007/978-3-030-01057-7_85
Liu C.H. et al. Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach. IEEE J. Sel. Areas Commun., Aug. 2018, Vol. 36 (9), 2059–2070. https://doi.org/10.1109/JSAC.2018.2864373
Yang J. et al. Application of reinforcement learning in UAV cluster task scheduling. Future Gener. Comput. Syst. 2019, Vol. 95, 140–148. https://doi.org/10.1016/j.future.2018.11.014
Wang C. et all. Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning. IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, Canada, 14–16 Nov. 2017, 858–862. https://doi.org/10.1109/GlobalSIP.2017.8309082
Huang Y., Wei G., Wang, Y. V-D D3QN: the Variant of Double Deep Q-Learning Network with Dueling Architecture. 37th Chinese Control Conference (CCC), Wuhan, China, 2018, 9130–9135. https://doi.org/10.23919/ChiCC.2018.8483478
Imanberdiyev N., Fu C., Kayacan E., Chen I.M. Autonomous navigation of UAV by using real-time model-based reinforcement learning. 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), Phuket, Thailand, 13–15 Nov. 2016, 1–6. https://doi.org/10.1109/ICARCV.2016.7838739
Hester T., Stone P. TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots. AAAI Technical Report SS-12-02. Designing Intelligent Robots: Reintegrating AI. Department of Computer Science, The University of Texas at Austin. URL: https://cdn.aaai.org/ocs/4271/4271-19461-1-PB.pdf
Moerland T.M. Model-based Reinforcement Learning: A Survey. 2020, 67 p. URL: http://arxiv.org/pdf/2006.16712
Bou-Ammar H., Voos H., Ertel W. Controller Design for Quadrotor UAVs using Reinforcement Learning. IEEE International Conference on Control Applications, Yokohama, Japan, 8–10 Sept. 2010, 2130–2135. https://doi.org/10.1109/CCA.2010.5611206
Henderson P. et al. Deep Reinforcement Learning that Matters. Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI), 2018. https://doi.org/10.1609/aaai.v32i1.11694
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Information Technologies and Systems

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The paper is an Open Access under the CC BY-NC-ND 4.0 license - Attribution-NonCommercial-NoDerivatives 4.0 International.