RecList: A RecSys software library for behavioral testing
Patrick John Chia, Jacopo Tagliabue, Federico Bianchi, Chloe He, and Brian Ko have released a new software library for recommender-systems testing. Source code is on GitHub and there is a paper upcoming at the WebConference (arXiv preprint).
RecList is an open source library providing behavioral, “black-box” testing for recommender systems. Inspired by the pioneering work of Ribeiro et al. 2020 in NLP, we introduce a general plug-and-play procedure to scale up behavioral testing, with an easy-to-extend interface for custom use cases.
While quantitative metrics over held-out data points are important, a lot more tests are needed for recommenders to properly function in the wild and not erode our confidence in them: for example, a model may boast an accuracy improvement over the entire dataset, but actually be significantly worse than another on rare items or new users; or again, a model that correctly recommends HDMI cables as add-on for shoppers buying a TV, may also wrongly recommend TVs to shoppers just buying a cable.
RecList goal is to operationalize these important intuitions into a practical package for testing research and production models in a more nuanced way, without requiring unnecessary custom code and ad hoc procedures. To streamline comparisons among existing models, RecList ships with popular datasets and ready-made behavioral tests: read the TDS blog post as a gentle introduction to the main use cases, check the paper for more details on the relevant literature.
If you are not familiar with the library, we suggest first taking our small tour to get acquainted with the main abstractions through ready-made models and public datasets.