Підходи до створення мультиагентних систем і глибокого посиленого навчання

О.А. Урсатьєв; О.Є. Волков

doi:10.15407/intechsys.2025.03.030

Автор(и)

О.А. Урсатьєв Інститут інформаційних технологій та систем НАН України https://orcid.org/0009-0009-8323-0525
О.Є. Волков https://orcid.org/0000-0002-5418-6723

DOI:

https://doi.org/10.15407/intechsys.2025.03.030

Ключові слова:

безпілотні рухомі об'єкти, безпілотні літальні апарати БПЛА, управління роєм БПЛА, глибоке посилене навчання, ментальна модель світу, навчання нейромережі в уяві з застосуванням моделі світу, парадигми навчання агентів та схеми виконання завдань у багатоагентному середовищі

Анотація

Розглянуто зарубіжний досвід розробки та застосування штучного інтелекту за допомоги глибокого посиленого навчання нейромереж для розв’язання проблем з якими стикаються рухливі об’єкти у невідомих, можливо частково спостережуваних середовищах, для опису яких не існує математичної моделі. Надано таксономію різних завдань, що виникають при управлінні БПЛА чи роєм дронів, і наведено запропоновані безмодельні алгоритми глибокого посиленного навчання для розв’язання кожного з них. Виконано математичну формалізацію завдань у сфері управління БПЛА при посиленому навчанні, зокрема розглянуто парадигму навчання у багатоагентному середовищі. Розглянуто рішення деяких завдань щодо використання БПЛА.

Посилання

Azar AT, Koubaa A., Ali Mohamed N. et al. Drone Deep Reinforcement Learning: A Review. Electronics 2021, Vol. 10(9), 1–30. https://doi.org/10.3390/electronics10090999

Oursatyev O.A, Volkov, O.Ye, Tkalya V.H. Automated Machine Learning. State and Prospects Development. Information Technologies and Systems, Vol. 2(2), 3–33. [In Ukrainian: Автоматизоване машинне навчання. Стан та перспективи розвитку. Information Technologies and Systems (Інформаційні технології та системи)] https://doi.org/10.15407//intechsys.2025.02.003

Schmidhuber J. Deep Learning in Neural Networks: An Overview. Neural Networks, 2015, Vol. 61, 85-117. https://doi.org/10.1016/j.neunet.2014.09.003

Sutton, R.S., Barto A.G. Reinforcement Learning: An Introduction. MIT Press,Cambridge, Massachusetts, London, England, 2018, 526 p. URL: https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf

Schmidhuber J. Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments. IDSIA, 1990. URL: https://people.idsia.ch/~juergen/FKI-126-90_(revised)bw_ocr.pdf

Schmidhuber J. Reinforcement Learning in Markovian and Non-Markovian Environments. IDSIA, 1991. URL: https://sferics.idsia.ch/pub/juergen/nipsnonmarkov.pdf

Arulkumaran K. et al. Deep reinforcement learning: A brief survey, 2017. https://doi.org/10.1109/MSP.2017.2743240

Zhang H., Yu T. Taxonomy of Reinforcement Learning Algorithms. Deep Reinforcement Learning, Springer Singapore, 2020. https://doi.org/10.1007/978-981-15-4095-0_3

Schmidhuber J. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. IJCNN International Joint Conference on Neural Networks, 1990, Vol. 2, 253-258. https://doi.org/10.1109/IJCNN.1990.137723

Nagabandi A. et al. PDDM: Planing with Deep Dynamics Models for Learning Dexterous Manipulation. Conference on Robot Learning (CoRL), 2019. URL: https://sites.google.com/view/pddm

Poole D.L., Mackworth A.K. AI Foundations of Computational Agents. University of British Columbia, TJ Books Limited, Padstow, Cornwall, 2023, 870 p. https://doi.org/10.1017/9781009258227

Francois-Lavet V., Henderson P. et al. An Introduction to Deep Reinforcement Learning. 2018. https://doi.org/10.1561/9781680835397

Ha D., Schmidhuber J. World Models. Can agents learn inside of their own dreams? NIPS 2018, March 27 2018, Oral Presentation. https://doi.org/10.5281/zenodo.1207631

Anastasis Germanidis. Introducing General World Models. URL: https://research.runwayml.com/introducing-general-world-models [Accessed Dec. 2023]

Gronauer S., Diepold K. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 2021, 1–49. URL: https://link.springer.com/article/10.1007/s10462-021-09996-w

Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction. A Bradford Book, The MIT Press Cambridge, Massachusetts, London, England,2015, 1–337. URL: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

Littman M.L. Markov games as a framework for multi-agent reinforcement learning. Machine Learning Proceedings, 11-th International Conference, Rutgers University, New Brunswick, NJ, 10–13 Jul. 1994, 157-163. https://doi.org/10.1016/B978-1-55860-335-6.50027-1

Fully Observable vs. Partially Observable Environment in AI. URL: https://www.geeksforgeeks.org/fully-observable-vs-partially-observable-environment-in-ai/[Accessed May. 2024]

Tožicka J., Szulyovszky B., de Chambrier G. et all. Application of deep ˇreinforcement learning to UAV fleet control. SAI Intelligent Systems Conference, London, UK, 5–6 Sept. 2018, 1169–1177. https://doi.org/10.1007/978-3-030-01057-7_85

Liu C.H. et al. Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach. IEEE J. Sel. Areas Commun., Aug. 2018, Vol. 36 (9), 2059–2070. https://doi.org/10.1109/JSAC.2018.2864373

Yang J. et al. Application of reinforcement learning in UAV cluster task scheduling. Future Gener. Comput. Syst. 2019, Vol. 95, 140–148. https://doi.org/10.1016/j.future.2018.11.014

Wang C. et all. Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning. IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, Canada, 14–16 Nov. 2017, 858–862. https://doi.org/10.1109/GlobalSIP.2017.8309082

Huang Y., Wei G., Wang, Y. V-D D3QN: the Variant of Double Deep Q-Learning Network with Dueling Architecture. 37th Chinese Control Conference (CCC), Wuhan, China, 2018, 9130–9135. https://doi.org/10.23919/ChiCC.2018.8483478

Imanberdiyev N., Fu C., Kayacan E., Chen I.M. Autonomous navigation of UAV by using real-time model-based reinforcement learning. 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), Phuket, Thailand, 13–15 Nov. 2016, 1–6. https://doi.org/10.1109/ICARCV.2016.7838739

Hester T., Stone P. TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots. AAAI Technical Report SS-12-02. Designing Intelligent Robots: Reintegrating AI. Department of Computer Science, The University of Texas at Austin. URL: https://cdn.aaai.org/ocs/4271/4271-19461-1-PB.pdf

Moerland T.M. Model-based Reinforcement Learning: A Survey. 2020, 67 p. URL: http://arxiv.org/pdf/2006.16712

Bou-Ammar H., Voos H., Ertel W. Controller Design for Quadrotor UAVs using Reinforcement Learning. IEEE International Conference on Control Applications, Yokohama, Japan, 8–10 Sept. 2010, 2130–2135. https://doi.org/10.1109/CCA.2010.5611206

Henderson P. et al. Deep Reinforcement Learning that Matters. Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI), 2018. https://doi.org/10.1609/aaai.v32i1.11694