Adaptively Learning to Rank Unstructured Items in Online Platforms
Dr. Ruohan Zhan
Assistant Professor
Department of Industrial Engineering and Decision Analytics
The Hong Kong University of Science and Technology
We consider the problem of learning to adaptively rank a list of unstructured items to optimize cumulative user engagement in online platforms. We formulate this problem via a contextual-bandit approach, with actions being ranking items that account for user characteristics and position effects. For each item ranked at a specified position, we adjust its predicted user engagement score by an upper confidence bound, coined as optimistic score, to balance exploration and exploitation. Our algorithm applies the ranking action that maximizes the aggregation of optimistic engagement scores from the item list, which we solve by employing techniques from maximum weight matching. Under the generalized linear model assumption of user engagement scores, we prove that our algorithm achieves the regret of $\Tilde{O}(Kd\sqrt{T})$ when ranking K items in a d-dimensional feature space over T rounds. This regret relieves the dependence on the size of action space that contains K! different rankings. Experimental results demonstrate the improved user engagement of our algorithm as compared to alternative baselines.
This is joint work with Perry Dong, Ying Jin, and Zhengyuan Zhou.