Call for Papers: Challenges in Modern Multimodal Recommender Systems (ACM TORS)
Multimodal recommendation is one of those ideas that sounds obvious until one looks more closely. Of course recommender systems should use more than one signal. Why rely only on clicks or ratings when one can also exploit text, images, audio, video, metadata, knowledge graphs, or behavioral traces? And yet, as soon as multiple modalities enter the picture, the problem becomes substantially harder. More information does not automatically mean better recommendation. Sometimes it simply means more noise, more inconsistency, and more opportunities for a system to fail.
This is precisely why the Special Issue on Challenges in Modern Multimodal Recommender Systems in ACM Transactions on Recommender Systems is so timely. The call is refreshingly clear about the fact that modern multimodal recommenders are not merely a fashionable extension of classical models. They now sit at the core of e-commerce, entertainment, news, social media, and personalized content delivery. At the same time, they face serious problems in design, deployment, and evaluation that are still far from solved.
There is also an important conceptual point here. The call carefully distinguishes personalized recommendation from generic multimedia retrieval. That distinction matters. A system may be very good at retrieving visually or semantically similar items and still be poor at recommendation, because recommendation is not only about similarity. It is about relevance for a user, in a context, over time. I am glad to see the special issue explicitly invite work on both multimodal candidate retrieval and personalized ranking, because in real systems the two belong together.
Another strength of the call is its insistence on evaluation. Too much work in recommender systems still relies on static offline metrics as if they were the whole story. For multimodal systems this is especially problematic. Accuracy on a benchmark split may reveal very little about robustness, fairness, transparency, user trust, or long-term effects. Online evaluation, feedback loops, exposure bias, and system-induced behavioral effects are not side issues here. They are central challenges. In that sense, this special issue addresses not only model building, but also the more mature question of how multimodal recommenders behave in the wild.
The scope is broad and well chosen. The editors invite work on data-centric challenges, multimodal representation learning, robust fusion, cold start and long-tail recommendation, evaluation and benchmarking, online experimentation, deployment and MLOps, human-centered and ethical questions, and the growing role of generative and foundation models. I was pleased to see privacy, copyright, content moderation, fairness, and explainability treated as integral parts of the agenda rather than polite afterthoughts. That is exactly right. A modern multimodal recommender system is not only a prediction machine. It is a socio-technical system that must justify its behavior.
The guest editors are Yubin Kim, Daniele Malitesta, Alberto Carlo Maria Mancino, Claudio Pomo, and Shah Nawaz. This is a strong and international editorial team, and it suggests that the issue aims to combine methodological depth with practical relevance. The call also explicitly welcomes both foundational research and applied industrial work, which is another good sign. Multimodal recommendation is one of those areas where theory and deployment really should speak to each other more often.
The deadline for submission is June 1, 2026. First-round decisions are expected by August 1, 2026, revision submissions are due on October 1, 2026, and final decisions are planned for February 1, 2027. The issue also welcomes extended versions of strong conference papers, provided they contain substantial new material.
Overall, this looks like an excellent call. Multimodal recommender systems have become central to the field, but their challenges are often underestimated. They promise richer personalization, yet they also expose the limits of our models, datasets, and evaluation habits. That makes them difficult. But it also makes them scientifically interesting. For exactly that reason, this special issue should attract considerable attention.

