International Joint Conference On Theoretical Computer Science – Frontier of Algorithmic Wisdom

August 15-19, 2022, City University of Hong Kong, Hong Kong


Invited Speakers

Undergraduate Research Forum

Towards Data Efficiency in Offline Reinforcement Learning

Baihe Huang

Peking University

Exploiting historical data to improve the decision-making strategies of intelligent systems, known as Offline reinforcement learning (RL), has offered a promising prospect for applying RL to many real-world problems where online access to the environment may be subject to high costs or risks. At the heart of such a method is how an agent should learn from fewer samples in order to maximize the expected outcomes while operating in a vast, dynamic environment. However, sample-efficiency guarantees for offline RL often rely on strong assumptions about the function approximation and the coverage of offline data. In this talk, I will discuss a simple algorithm based on the primal-dual formulation of Markov Decision Processes that enjoys polynomial sample complexity under relaxed conditions. The presented algorithm will be complemented by extensions and analyses that provide a deeper understanding of primal-dual algorithms in offline RL.

Baihe Huang is a fourth-year undergraduate student at Peking University. He will join UC Berkeley as a CS Ph.D. student in 2022 fall.