Datasets

There are a plethora of recommender-system datasets, and, more generally, almost every machine learning dataset can be used for recommendation systems, too. The de-facto standard dataset for recommendations is probably the MovieLens dataset (which exists in multiple variations). Based on a small study that we conducted, 40% of all research papers at the ACM Recommender Systems Conference use the MovieLens dataset (among others). Other popular datasets include the Amazon and Yelp datasets.

Popularity of Recommender-System Datasets

Standard Datasets for Beginners and Baselines

Yet to Come…

Large Datasets (many instances and/or many features)

Yet to Come…

‘Real-World’ Datasets

Those being interested in large-scale noisy real-world datasets may want to look at the datasets being released as part of the yearly RecSys Challenge 2020 (Twitter), 2019 (Trivago), 2018 (Spotify), 2017 (XING), and 2016 (XING, CrowdRec, MTA Sztaki).

More Yet to Come…

Special Datasets

Yet to Come…

Dataset Repositories and Search Engines

There are multiple search engines and repositories for recommender-systems (and other) datasets. Most notably Google Dataset Search (Generic), Kaggle (Machine Learning), TREC (Information Retrieval), NTCIR (Information Retrieval), UCI Machine Learning Repository (Machine Learning).

Other Lists of Datasets

Julian McAuley (UCSD) created a nice list with extracts from the datasets that allow a quick idea of how the dataset looks like.

News About Recommender-System Datasets

Spotify Re-Releases its Million-Playlist Dataset from the RecSys Challenge 2018

In 2018, Spotify co-organized the ACM RecSys Challenge and provided a massive dataset of 1 million playlists consisting of 2 million tracks by around 300,000 artists. A few days ago, Ching-Wei Chen from Spotify announced to re-release the dataset and create an open-ended challenge on AICrowd. This seems to be a great resource for recommender-systems […]

Posted in Datasets | Tagged , , | Leave a comment

Dataset search: a survey [Chapman et al. 2020]

Finding recommender-system datasets is a challenge. The survey by Chapman et al. may help by providing a thorough overview of dataset search engines for all kinds of datasets, not only relating to recommender systems. Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to […]

Posted in Datasets, Literature Surveys | Tagged , , , , , , , , | Leave a comment

RS_Datasets: Download, Unpack and Read Recommender Systems Datasets into pandas.DataFrame [Darel13712]

rs_datasets “allows you [to] download, unpack and read recommender systems datasets into pandas.DataFrame as easy as data = Dataset().The following datasets are available for automatic download and can be retrieved with this package.” Web Page: https://darel13712.github.io/rs_datasets/ GitHub: https://github.com/Darel13712/rs_datasets/ Dataset Users Items Interactions Movielens 162k 62k up to 25m Million Song Dataset 1m 385k 48m Netflix […]

Posted in Datasets, Software Libraries & APIs | Tagged , , | Leave a comment