Doubly high-dimensional contextual bandits: An interpretable model with applications to assortment/ pricing
This is a joint seminar organized by Department of Statistics and Actuarial Science, Faculty of Science and HKU Business School’s IIM Area.
Prof. Linda Zhao
Department of Statistics
The Wharton School
University of Pennsylvania
We consider contextual bandits that are doubly high-dimensional in the sense that both covariates and actions are allowed to take values in high-dimensional spaces. We propose a simple model that captures the interactions between covariates and actions via a (near) low-rank representation matrix. The resulting class of models is reasonably expressive while remaining interpretable, and includes various structured linear bandit models as particular cases. We propose a computationally tractable procedure that combines an exploration/exploitation protocol with an efficient low-rank matrix estimator, and we prove bounds on its regret. Simulation results show that this method has lower regret than state-of-the-art methods applied to various standard bandit models. We also apply our method to a real-world online retail data set involving assortment and pricing; in contrast to most existing methods, our method allows the assortment-pricing problem to be solved simultaneously. We demonstrate the effectiveness of this joint approach for revenue maximization.
Joint work with J. Cai, R. Chen and M. Wainwright.
Prof. Zhao is a professor of statistics at the University of Pennsylvania. After obtaining her PhD in Mathematics/ Statistics from Cornell University, she taught in UCLA, Los Angeles for one year and then she joined the Wharton School, University of Pennsylvania. Her research area covers statistical machine learning, data-driven decision-making, crowdsourcing, post-selection inference, network analysis, nonparametric Bayes, equity ownership, education in data science.