Speaker: Mo Zhou
Date: Oct. 15, Tuesday
Beijing Time: 09:00am-11:00am
Tencent: 563-269-854
Abstract:
We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is implemented as policy improvement for the actor. Our key contribution lies in establishing the global convergence property of our proposed actor-critic flow, demonstrating a linear rate of convergence. Theoretical findings are further validated through numerical examples, showing the efficacy of our approach in practical applications.
Biography:
Mo Zhou, an assistant adjunct professor in the Department of Mathematics at UCLA, works under the mentorship of Prof. Stan Osher and is also a member of Prof. Hayden Schaeffer’s group. Previously, he was a graduate student at Duke University, where he was advised by Prof. Jianfeng Lu. His research interests include deep learning, reinforcement learning, optimal control, and mean-field control and games. Currently, he is developing new algorithms and conducting theoretical analysis on mean-field control and games.