International Joint Conference On Theoretical Computer Science – Frontier of Algorithmic Wisdom

August 15-19, 2022, City University of Hong Kong, Hong Kong


Invited Speakers

Track C

A Continuum of Solutions to Cooperative Multi-Agent Reinforcement Learning

Yaodong Yang

Peking University

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community. However, many research endeavours have been focused on developing practical MARL algorithms whose effectiveness has been studied only empirically, thereby lacking theoretical guarantees. As recent studies have revealed, MARL methods often achieve unstable performance in terms of reward monotonicity or suboptimal at convergence. In this paper, to resolve these issues, we introduce a novel framework named Heterogeneous-Agent Mirror Learning (HAML) that provides a general template for MARL algorithmic designs. We prove that algorithms derived from the HAML template satisfy the desired properties of the monotonic improvement of the joint reward and the convergence to Nash equilibrium. As a natural outcome of our theory, we propose a continuum of effective cooperative MARL algorithms, HATRPO, HAPPO, HAA2C and HADDPG, and demonstrate their effectiveness against strong baselines on StarCraftII and Multi-Agent MuJoCo tasks and Bimanual Dex-hands Manipulations.

Yaodong is a machine learning researcher with ten-year working experience in both academia and industry. Currently, he is an assistant professor at Peking University. His research is about reinforcement learning and multi-agent systems. He has maintained a track record of more than forty publications at top conferences and journals, along with the best system paper award at CoRL 2020 (first author) and the best blue-sky paper award at AAMAS 2021 (first author). Before joining Peking University, he was an assistant professor at King's College London. Before KCL, he was a principal research scientist at Huawei U.K. where he headed the multi-agent system team in London. Before Huawei, he was a senior research manager at AIG, working on AI applications in finance. He holds a Ph.D. degree from University College London, an M.Sc. degree from Imperial College London and a Bachelor degree from University of Science and Technology of China.