Project Ideas for Bachelor/Master/PhD theses

Key to a successful Bachelor, Master and PhD thesis is an interesting topic. Ideally, your supervisor has an inspiring idea, but if not, feel free to browse through the following ideas. If you choose one of the ideas, please attribute the author of the idea when writing your thesis.

If you are a researcher and would like to present some of your ideas, please contact us. We need from you

  • Title
  • Short 1-3 paragraph description
  • Link to a more comprehensive description if available
  • Details on whether you might be interested in co-supervising, or if you just want the idea to be listed here hoping that someone will use it (without further input from your side).

It’s Time to Consider Time in Recommender-System Research

Suggested by: Prof Joeran Beel, Trinity College Dublin (interested in co-supervision)

Recommender-system evaluation is an actively discussed topic in the recommender-system community. Discussions include advantages and disadvantages of evaluation methods such as online evaluations, offline evaluations, and user studies [1–4]; the ideal metrics to measure recommendation effectiveness [5–8]; and ensuring reproducibility [9–11]. Over the last years, several workshops about recommender-system evaluation were held and journals published several special issues [11–14]. An issue that has received (too) little attention is the question if presenting results as a single number is sufficient or if metrics should be presented for time intervals. Typically, researchers calculate a few metrics for each algorithm (e.g. precision p, normalized discounted cumulative gain nDCG, root mean square error RMSE, mean absolute error MAE, coverage c, or serendipity s). For each metric, they present a single number such as p = 0.38, MAE = 1.02, or c = 97%, i.e. the metrics are calculated based on all data available. Hence, the metrics express how well an algorithm performed on average over the period of data collection, which is often rather long. For instance, the data in the MovieLens 20m dataset was collected over ten years [15]. This means, when a researcher reports that an algorithm has e.g. an RMSE of 0.82 on the MovieLens dataset, the algorithm had that RMSE on average over ten years.  

We argue that presenting a single number that expresses the overall average is problematic as an average provides only a broad and static view of the data. If someone was asked how an algorithm had performed over time – i.e. before, during, and after the data collection period, the best guess, based on a single number, would be that the algorithm had the same effectiveness all the time. We argue that such assumptions are naïve, as many algorithms’ effectiveness is not stable over time. It is well known that the effectiveness of many recommendation algorithms depends on the number of users, items, and ratings as well as algorithm’s parameters such as neighborhood size or user model size [16–19].

We propose that, instead of a single number, recommender-systems researchers should present metrics for time series, i.e. each metric should be calculated for a certain interval of the data collection period, e.g. for every day, week, or month. This will allow to gain more information about an algorithm’s effectiveness over time, identify trends, make better predictions on how an algorithm will perform in the future, and hence to make more meaningful conclusions on which algorithms to deploy in a recommender system.

Read more: https://www.researchgate.net/publication/319349809_It’s_Time_to_Consider_Time_when_Evaluating_Recommender-System_Algorithms_Proposal

User Interfaces for Recommender-Systems

Suggested by: Prof Joeran Beel, Trinity College Dublin (interested in co-supervision)

User Interfaces can have a major impact on user satisfaction with a website or application. This is well known, and researched, in many disciplines like search engines. It is also acknowledged in the recommender-system community that user interfaces may have an impact on user satisfaction. However, empirical evidence how strong this impact may be, is rare.

Your task would be to implement different user interfaces in our recommender-system as-a-service Darwin & Goliath, and A/B test how the interfaces affect e.g. click-through rate and other KPIs.

Meta-Learned Algorithm Selection for Machine Learning, Information Retrieval or NLP Systems

Suggested by: Prof Joeran Beel, Trinity College Dublin (interested in co-supervision)

We recently presented the idea of a “macro recommender system”, and “micro recommender system” respectively. Both systems can be considered as a recommender system for recommendation or machine learning algorithms. A macro recommender system recommends the potentially best performing recommendation algorithm to an organization that wants to build a recommender system. This way, an organization does not need to test dozens or even more algorithms to find the best one for their particular platform (e.g. a news website or digital library). A micro recommender system recommends the potentially best-performing recommendation algorithm for each individual recommendation request. This proposal is based on the premise that there is no single-best algorithm for all type of users, items, and contexts. For instance, a micro recommender system might recommend using algorithm A when recommendations for an elderly male user in the evening should be created. When recommendations for a young female user in the morning should be given, the micro recommender system might recommend a different algorithm. Please read here for more details https://www.researchgate.net/publication/322138236_A_MacroMicro_Recommender_System_for_Recommendation_Algorithms_Proposal

Deep Citation Proximity: Learning Citation-Based Document Relatedness for Uncited Documents

Suggested by: Prof Joeran Beel, Trinity College Dublin (interested in co-supervision)

n this project, you will train a deep neural network — probably a Siamese Neural Network — to learn how related two documents are. The training will be based on co-citations as ground truth. By doing this project, you will gain advanced knowledge in machine learning as well as recommender systems. This project can be done in the context of a recommender system for research papers, or in the context of a recommender system for websites (websites in general, or Wikipedia, or news…).

To learn more about this project, please read this proposal: https://www.scss.tcd.ie/joeran.beel/pubs/2017%20–%20Virtual%20Citation%20Proximity%20(VCP)%20Calculating%20Co-Citation-Proximity-Based%20Document%20Relatedness%20for%20Uncited%20Documents%20with%20Machine%20Learning.pdf

AutoRecSys/AutoIR/AutoNLP: Automated Recommender Systems / Information Retrieval / Natural Language Processing

Suggested by: Prof Joeran Beel, Trinity College Dublin (interested in co-supervision)

In recent years, the AutoML (Automated Machine Learning) community has made huge advances in automating the entire machine learning pipeline, including algorithm selection and configuration. The recommender-systems and information-retrieval communities, and to some extent the NLP community, have fallen behind when it comes to optimizing and automating their workflows.

The goal of this project would be to transfer the tools and methods from AutoML to RecSys (AutoRecSys), IR (AutoIR), or Natural Language Processing (AutoNLP). The outcome could be new/enhanced recommendation/IR/NLP frameworks similar to AutoWEKA or AutoScikit-learn, or simply methods and strategies to automatically select and configure recsys/IR/NLP algorithms.

A very simple strategy could be: Take an existing framework, e.g. Mahout or http://surpriselib.com/ , identify the best algorithm configurations for a number of datasets (take results e.g. from Kaggle competitions; or run various algorithms and parameters yourself), and create kind of a dictionary. If a user later wants to run the library on dataset x, the new framework recommends the configuration that has been best before. of course, ideally, this would not be a dictionary but a meta-learning approach.