Penn Arts & Sciences Logo

AMCS Colloquium

Friday, October 21, 2022 - 1:45pm

Yanjun Han

Massachusetts Institute of Technology

Location

University of Pennsylvania

PCPE 200

Many existing literature on bandits and reinforcement learning assume a linear reward/value function, but what happens if the reward is non-linear? Two curious phenomena arise for non-linear bandits: first, in addition to the "learning phase" with a standard \Theta(\sqrt(T)) regret, there is an "initialization phase" with a fixed cost determined by the reward function; second, achieving the smallest cost of the initialization phase requires new learning algorithms other than traditional ones such as UCB. For a special family of non-linear bandits, we derive upper and lower bounds on the optimal fixed cost, and in addition, on the entire learning trajectory in the initialization phase via differential equations. In particular, we show that a two-stage algorithm which first finds a good initialization and then treats the problem as a locally linear bandit is statistically optimal. In contrast, several classical algorithms, such as UCB and algorithms relying on online regression oracles, are provably suboptimal.

This is based on a recent joint work with Jiantao Jiao, Nived Rajaraman, and Kannan Ramchandran.

Bio: Yanjun Han is a Norbert Wiener postdoctoral associate in the Statistics and Data Science Center, mentored by Sasha Rakhlin and Philippe Rigollet. He received his Ph.D. in Electrical Engineering from Stanford University in Aug 2021, under the supervision of Tsachy Weissman. After that, he spent one year as a postdoctoral scholar at the Simons Institute for the Theory of Computing, UC Berkeley. Starting from Sept 2023, he will be an assistant professor of mathematics and data science at the Courant Institute of Mathematical Sciences and the Center for Data Science at NYU. His research interests lie in statistical machine learning, high-dimensional and nonparametric statistics, online learning and bandits, and information theory.