Welcome to RS_c, the central platform for the RecSys community. We provide curated lists of recommender-systems datasets, algorithms, books, conferences and many resources more. Maybe most importantly, we publish the latest recommender-system news. If you want your news to be reported on RS_c, read here.
There are a plethora of recommender-system datasets, and, more generally, almost every machine learning dataset can be used for recommendation systems, too. The de-facto standard dataset for recommendations is probably the MovieLens dataset (which exists in multiple variations). Based on a small study that we conducted, 40% of all research papers at the ACM Recommender Systems Conference use the MovieLens dataset (among others). Other popular datasets include the Amazon and Yelp datasets.
Standard Datasets for Beginners and Baselines
Yet to Come…
Large Datasets (many instances and/or many features)
Yet to Come…
Those being interested in large-scale noisy real-world datasets may want to look at the datasets being released as part of the yearly RecSys Challenge2020 (Twitter), 2019 (Trivago), 2018 (Spotify), 2017 (XING), and 2016 (XING, CrowdRec, MTA Sztaki).
In 2018, Spotify co-organized the ACM RecSys Challenge and provided a massive dataset of 1 million playlists consisting of 2 million tracks by around 300,000 artists. A few days ago, Ching-Wei Chen from Spotify announced to re-release the dataset and create an open-ended challenge on AICrowd. This seems to be a great resource for recommender-systems […]
Finding recommender-system datasets is a challenge. The survey by Chapman et al. may help by providing a thorough overview of dataset search engines for all kinds of datasets, not only relating to recommender systems. Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to […]
rs_datasets “allows you [to] download, unpack and read recommender systems datasets into pandas.DataFrame as easy as data = Dataset().The following datasets are available for automatic download and can be retrieved with this package.” Web Page: https://darel13712.github.io/rs_datasets/ GitHub: https://github.com/Darel13712/rs_datasets/ Dataset Users Items Interactions Movielens 162k 62k up to 25m Million Song Dataset 1m 385k 48m Netflix […]