State-of-the-Art Algorithms

What are/is the state-of-the-art recommendation algorithm(s)?” is a question that should be a no-brainer to answer for any recommender-system researcher and developer. However, the answer is typical “I don’t know, at least not for sure”.

The recommender-system community faces a reproducibility crisis, which makes it almost impossible to say what algorithms are truly state-of-the-art. In a recent paper (Are we really making much progress? A worrying analysis of recent neural recommendation approaches), the authors found that of 18 algorithms, 11 algorithms (61%) could not be reproduced at all. For the remaining 7 algorithms (39%), the authors managed to reproduce the results, but:

6 of [the algorithms could] often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned nonneural linear ranking method.

Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach

In short: none of the 18 novel algorithms lead to a real improvement over a relatively simple baseline.

Similarly, in one of our own studies, Towards reproducibility in recommender-systems research, we showed that recommendation algorithms perform vastly differently in slightly different scenarios (figure below).

https://link.springer.com/article/10.1007/s11257-016-9174-x

We tested five recommendation algorithms on six German news websites. On almost every website a different algorithm was best (and worst). ‘User-based CF’ performed best on ksta.de, the ‘Most popular sequence’ performed best on sport1.de, the normal most ‘popular algorithm’ performed best on ciao.de, and content-based filtering performed best on motor-talk.de.

Even worse, algorithms performed differently at different times, for different genders etc. For instance, between 18:00 and 4:00 o’clock, the most popular sequence algorithm performed best (see below). Between 4:01 and 17:59 o’clock, the standard “most popular” algorithm performed best.

https://link.springer.com/article/10.1007/s11257-016-9174-x

So, what can you do? Your best bet is probably just trying out as many out-of-the-box algorithms that are implemented in recommender-system libraries. Or use a recommender-system as-a-service and hope that the operators did a good job. Alternatively, have a look at https://paperswithcode.com/task/recommendation-systems to get an idea of what algorithms may perform well. Unfortunately, only a small number of algorithms are available on Papers With Code.

https://paperswithcode.com/sota/collaborative-filtering-on-movielens-10m