REVEAL@RecSys2020: Workshop on Bandit and Reinforcement Learning from User Interactions
State-of-the-art recommender systems are notoriously hard to design and improve upon, due to their interactive and dynamic nature. In particular, they involve a multi-step decision-making process, where a stream of interactions occurs between the user and the system. Leveraging reward signals from these interactions and creating a scalable and performant recommendation inference model is a key challenge. Traditionally, to make the problem tractable, the interactions are often viewed as independent, but in order to improve recommender systems further, the models will need to take into account the delayed effects of each recommendation and start reasoning/planning for longer-term user satisfaction. To this end, our workshop invites contributions that enable recommender systems to adapt effectively to diverse forms of user feedback and to optimize the quality of each user’s long-term experience — specifically approaches that leverage bandit and reinforcement learning from user interactions.
Potential contributions include (but are not limited to):
Reinforcement learning and bandits for recommendation
Robust estimators, counterfactual and off-policy evaluation
Causal recommender systems
Using simulation for recommender systems evaluation
New evaluation datasets
New offline metrics for recommender systems