Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

Author: Publicsh date:2024-10-14 Clicks:

Speaker: Mo ZhouDate: Oct. 15, TuesdayBeijing Time: 09:00am-11:00amTencent: 563-269-854Abstract:We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is implemented as policy improvement for the actor. Our key contribution ...

Speaker: Mo Zhou

Date: Oct. 15, Tuesday

Beijing Time: 09:00am-11:00am

Tencent: 563-269-854

Abstract:

We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is implemented as policy improvement for the actor. Our key contribution lies in establishing the global convergence property of our proposed actor-critic flow, demonstrating a linear rate of convergence. Theoretical findings are further validated through numerical examples, showing the efficacy of our approach in practical applications.

Biography:

Mo Zhou, an assistant adjunct professor in the Department of Mathematics at UCLA, works under the mentorship of Prof. Stan Osher and is also a member of Prof. Hayden Schaeffer’s group. Previously, he was a graduate student at Duke University, where he was advised by Prof. Jianfeng Lu. His research interests include deep learning, reinforcement learning, optimal control, and mean-field control and games. Currently, he is developing new algorithms and conducting theoretical analysis on mean-field control and games.

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

Center for Mathematical Sciences, Huazhong University of Science and Technology