Automatic Podcast creation: Reproducibility in recommender systems research

December 6, 2024

Google released a fantastic tool to create podcasts using notes, slides, and … research papers as input! With NotebookLM, you upload content to Google; Google analyzes and summarizes the content and automatically creates, e.g. an interview or discussion between people who talk about the content you uploaded. For scientists, this is a great way to disseminate their research results or get an entertaining summary of a large number of research papers they always wanted but never found the time to read.

We tried NotebookLM to create a discussion between two persons about reproducibility in recommender systems. The result is impressive (not perfect, but very good).

Speaking of disseminating one’s research result, the primary input to the discussion was our own research, namely:

Beel, Joeran; Breuer, Timo; Crescenzi, Anita; Fuhr, Norbert; Li, Meije. Results-blind Reviewing. In: Bauer, Christine; Carterette, Ben; Ferro, Nicola; Fuhr, Norbert; Faggioli, Guglielmos (Ed.): Frontiers of Information Access Experimentation for Research and Education (Dagstuhl Seminar 23031), pp. 68-154, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023.
Beel, Joeran; Jannach, Dietmar; Said, Alan; Shani, Guy; Vente, Tobias; Wegmeth, Lukas. Best-Practices for Offline Evaluations of Recommender Systems. In: Bauer, Christine; Said, Alan; Zangerle, Eva (Ed.): Report from Dagstuhl Seminar 24211 – Evaluation Perspectives of Recommender Systems: Driving Research and Education, 2024.
Beel, Joeran. A Call for Evidence-based Best-Practices for Recommender Systems Evaluations. In: Bauer, Christine; Said, Alan; Zangerle, Eva (Ed.): Report from Dagstuhl Seminar 24211: Evaluation Perspectives of Recommender Systems: Driving Research and Education, 2024.
Beel, Joeran; Wegmeth, Lukas; Michiels, Lien; Schulz, Steffen. Informed Dataset Selection with ‘Algorithm Performance Spaces’. In: 18th ACM Conference on Recommender Systems, pp. 1085–1090, Association for Computing Machinery, Bari, Italy, 2024, ISBN: 9798400705052.
Vente, Tobias; Ekstrand, Michael; Beel, Joeran. Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit. In: Proceedings of the 17th ACM Conference on Recommender Systems, pp. 1212-1216, 2023.
Beel, Joeran; Brunel, Victor. Data Pruning in Recommender Systems Research: Best-Practice or Malpractice? In: 13th ACM Conference on Recommender Systems (RecSys), pp. 26–30, CEUR-WS, 2019.
Beel, Joeran; Breitinger, Corinna; Langer, Stefan; Lommatzsch, Andreas; Gipp, Bela. Towards Reproducibility in Recommender-Systems Research. In: User Modeling and User-Adapted Interaction (UMUAI), vol. 26, no. 1, pp. 69-101, 2016.
Wegmeth, Lukas; Vente, Tobias; Purucker, Lennart; Beel, Joeran. The Effect of Random Seeds for Data Splitting on Recommendation Accuracy. In: Proceedings of the 3rd Perspectives on the Evaluation of Recommender Systems Workshop, 2023.
Langer, Stefan; Beel, Joeran. The Comparability of Recommender System Evaluations and Characteristics of Docear’s Users. In: Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys), pp. 1–6, CEUR-WS, 2014.
Scheidt, Teresa; Beel, Joeran. Time-dependent Evaluation of Recommender Systems. In: Perspectives on the Evaluation of Recommender Systems Workshop, ACM RecSys Conference, 2021.

To make the discussion a bit broader, we also added a few papers by Dietmar Jannach, Alan Said et al.:

Jannach, D., Lerche, L., Kamehkhosh, I., & Jugovac, M. (2015). What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User Modeling and User-Adapted Interaction, 25, 427-491.
Said, A., & Bellogín, A. (2015, September). Replicable evaluation of recommender systems. In Proceedings of the 9th ACM Conference on Recommender Systems (pp. 363-364).
Ferrari Dacrema, M., Boglio, S., Cremonesi, P., & Jannach, D. (2021). A troubling analysis of reproducibility and progress in recommender systems research. ACM Transactions on Information Systems (TOIS), 39(2), 1-49.
Said, A., & Bellogín, A. (2014, October). Comparative recommender system evaluation: benchmarking recommendation frameworks. In Proceedings of the 8th ACM Conference on Recommender Systems (pp. 129-136).
Bellogín, A., & Said, A. (2021). Improving accountability in recommender systems research through reproducibility. User Modeling and User-Adapted Interaction, 31(5), 941-977.

And then the following prompt:

Discuss reproducibility of recommender systems research for 30 mins. Target new PhD students. Focus on giving hands-on advice on random seeds; Auto-RecSys; dataset selection and algorithm performance spaces; dataset pruning; importance of time, user characteristics, algorithms implementations in frameworks; worrying status of community. Discuss recent developments about checklists and results blind reviews (which are now Adapted by the ACM TORS Journal). Always mention author names.

The output is a 20-minute-long podcast, and frankly, it’s really good. Though, there are some problems. Several papers/topics were left out despite clear instructions for discussing them. Some concepts (Algorithm Performance Spaces for dataset selection) were misdescribed. Author names were rarely mentioned. Towards the end, the Podcast is quite repetitive. Nevertheless, this Podcast is probably as good as any average PodCast.

What do you think?

Tags:Google, NotebookLM, Offline Evaluation, Reproducibility

About The Author

Joeran Beel

I am the founder of Recommender-Systems.com and head of the Intelligent Systems Group (ISG) at the University of Siegen, Germany https://isg.beel.org. We conduct research in recommender-systems (RecSys), personalization and information retrieval (IR) as well as on automated machine learning (AutoML), meta-learning and algorithm selection. Domains we are particularly interested in include smart places, eHealth, manufacturing (industry 4.0), mobility, visual computing, and digital libraries. We founded or maintain, among others, LensKit-Auto, Darwin & Goliath, Mr. DLib, and Docear, each with thousand of users; we contributed to TensorFlow, JabRef and others; and we developed the first prototypes of automated recommender systems (AutoSurprise and Auto-CaseRec) and Federated Meta Learning (FMLearn Server and Client).

Related Posts

About The Author

Joeran Beel

Add a Comment