ACM’s New AI Policy Is Almost Right: AI-Written Papers Are Not the Problem. Unaccountable Papers Are!

June 9, 2026

ACM has changed its authorship policy on AI, and the core message is simple: AI-assisted writing no longer needs to be disclosed. Authors may use AI to help write an ACM submission, and ACM does not require a statement about that use. However, ACM still requires disclosure when AI is used in the research process itself, for example, in methodology, data creation, coding, simulations, analysis, validation, figures, or other artefacts that directly support the paper’s conclusions. Most importantly, all named authors remain fully responsible for the entire work, regardless of whether a problematic sentence, citation, dataset, or figure came from a human, an LLM, or some unholy late-night combination of both.

I think this is the right move. In fact, I think it is the only realistic move. And my personal view goes even further than ACM’s current policy. I do not think AI use should require special disclosure, even when it is used for experiments, code, or analysis. Of course, authors must describe their methods, data, code, parameters, and reproducibility-relevant artefacts. But “we used AI” should not become a ritual confession. In ten years, people will look back and laugh: “Researchers disclosed that they used AI to write source code? Why? Of course they did.” It will sound like disclosing that one used Python, Excel, Grammarly, a spell checker, or an IDE.

Table of Contents

My Perspective: RecSys 2027, AutoRecSys, and AutoRecLab

I write this as someone who is not neutral. I will serve as one of the general co-chair of ACM RecSys 2027. I am also a strong advocate of automating recommender-systems research. AutoRecSys, AutoML for RecSys, AI4Science, and what we called AutoRecLab are not side topics for me. They are part of where I believe our field must go.

In our AutoRecLab paper, we argued that recommender-systems research should move beyond narrow automation of algorithm selection and hyperparameter tuning. We called for autonomous recommender-systems research labs that support the full research lifecycle: ideation, literature analysis, experimental design, code execution, result interpretation, manuscript drafting, provenance logging, and eventually community-level governance. That argument becomes even more relevant now. If we believe that research itself can be automated, then drawing a sacred boundary around writing is inconsistent. Writing is not a separate temple. It is one part of the research pipeline.

The Survey: The Direction Is Clear

The survey of related work shows a clear pattern. Across publishers, journals, and conferences, the emerging baseline is that AI cannot be an author, humans remain accountable, substantive AI use is often disclosed, and confidentiality is the central issue in peer review. The differences are mostly about how much experimentation communities allow. Machine-learning conferences are moving fastest. Publishers and biomedical journals are more conservative. But almost nobody is moving toward “no AI.” The direction is toward AI-supported scholarly work.

This is already visible in conference experiments. To name a few: ICLR 2025 ran an LLM-feedback pilot for reviewers. ICML 2026 introduced reviewer-choice policies for LLM use and also tested Google’s Paper Assistant Tool for authors. NeurIPS 2026 experimented with author-side checklist assistance and later launched author-side and reviewer-side AI support. These systems are not framed as replacements for human reviewers or decision makers. They are framed as support tools. But that distinction is temporary in the practical sense: once support tools become good enough, not using them will feel strange.

The evidence is not perfect, but it is already strong enough to show where things are heading. ICLR 2025 reports that ICLR sent optional LLM feedback to 18,946 reviews; 26.6% of reviewers updated their reviews, and blinded evaluators preferred the assisted version in 89% of assessed cases. ICML’s Paper Assistant Tool reportedly gave feedback to roughly 4,500 papers, with 92.1% of survey respondents saying they would use it again. The survey also reports that 31% of experimental papers ran new experiments after receiving feedback, and 35.4% of theory papers reported theory gaps that took more than an hour to fix. ICML itself describes PAT as private, automated, actionable feedback before submission.

Of course there are failures. There are hallucinated citations, vague reviews, confidentiality risks, bias, over-reliance, prompt injection, gaming, and floods of low-value submissions. But this is what early infrastructure looks like. The steam engine exploded sometimes. Early spreadsheets contained bugs. Early recommender systems recommended embarrassing things. We did not conclude that engines, spreadsheets, or recommender systems should be banned. We built better versions.

The Current Backlash: NeurIPS and arXiv

This does not mean the topic is settled. Quite the opposite. NeurIPS 2026 recently announced that papers in its Position Paper Track must be substantially human-written, with AI limited to copy-editing or similar peripheral changes. arXiv has also changed its practice for review articles and position papers in the computer-science category. Such papers must now already have been accepted through peer review before arXiv will consider them.

These reactions are understandable. Position papers and surveys are especially easy to mass-produce with LLMs. They often require synthesis, judgment, framing, and taste. They also create review and moderation costs. If thousands of AI-generated position papers flood a venue or preprint server, the burden does not disappear. It moves to reviewers, area chairs, moderators, and readers. The result is a tax on everyone’s attention.

Still, I think the restrictive response is the wrong long-term answer. It treats the symptom rather than the system. The problem is not that AI can write a survey. The problem is that our publication infrastructure is not prepared for near-zero marginal-cost manuscript production. Banning or stigmatizing AI writing may reduce the pressure for a while. But it will also reward concealment, create brittle enforcement problems, and distract from the harder task: building better filters, better provenance, better review support, and better incentives.

Why ACM Is Right

ACM’s new policy is good because it focuses on responsibility. A paper should be judged by whether its claims are correct, its evidence is sound, its methods are reproducible, its citations are real, and its contribution matters. The author is responsible for all of that. The author cannot blame the LLM, just as the author cannot blame Python, Excel, TensorFlow, LaTeX, or a confused co-author.

This matters because disclosure can easily become symbolic rather than useful. “We used ChatGPT to improve the writing” tells the reader almost nothing. Did it invent a citation? Did it help structure the argument? Did it rewrite every sentence? Did it merely remove typos? Most such disclosures do not improve scientific evaluation. They mostly signal virtue, guilt, or compliance.

My view is that even ACM’s remaining distinction between writing and research use will age badly. Today, ACM still requires disclosure when AI is used for coding, experiments, simulations, analysis, or artefacts relevant to the conclusions. Authors must follow that policy when submitting to ACM. But as a matter of principle, I think the special treatment of “AI” will soon become obsolete. In many cases, authors may not even know whether AI was involved. An IDE may complete code using an AI model. A cloud notebook may suggest a data-cleaning step. A statistics package may internally choose an optimization strategy based on learned heuristics. A database system may rewrite queries. A reference manager may use AI to extract metadata. A plotting tool may recommend chart types. A spreadsheet may auto-detect patterns, fill missing values, or suggest formulas. A translation tool, grammar checker, OCR system, search engine, or PDF parser may rely on AI somewhere in the pipeline. Should all of that be disclosed? At some point, the question becomes meaningless.

We should disclose what matters for reproducibility and interpretation, not every tool category that happened to contain a neural network. If a model generated synthetic data, that matters. If an optimizer automatically selected parameters, that may matter. If an AutoML system searched over model families, that may matter. But if a tool wrote boilerplate code that was tested, reviewed, and archived, the relevant fact is the code and its provenance, not whether Copilot, ChatGPT, an IDE template, Stack Overflow, or a doctoral student typed the first draft.

Not using AI will soon look irresponsible in many settings. Writing complex code in a plain text editor rather than an IDE is possible, but rarely admirable. Calculating complex statistics by hand because Excel or Python is a “black box” is not scientific rigor. It is nostalgia with extra arithmetic. The serious question is not whether AI was used. The serious question is whether the resulting artefact is correct, inspectable, tested, and reproducible.

The RecSys Lesson: Automation Improves Research

This is exactly what we saw with LensKit-Auto. In that work, we found that 63.6% of surveyed papers using LensKit did not optimize hyperparameters or did not report doing so. That is not a small technicality. Poorly tuned baselines can distort scientific conclusions. LensKit-Auto automated algorithm selection, hyperparameter optimization, preprocessing, and post-hoc ensembling. It did not remove responsibility from researchers. It made better experimental practice easier.

Our AutoML-for-RecSys paper made the broader point. We compared 60 AutoML, AutoRecSys, ML, and RecSys algorithms on 14 explicit-feedback datasets. The overall ranking was AutoRecSys, then AutoML, then RecSys, then ML, then the baseline. AutoML libraries performed best on six datasets, while Auto-Surprise was best on five. The result was not that humans are useless. The result was that automation can help avoid bad defaults and make competent experimentation more accessible.

AutoRecLab is the logical continuation. A future RecSys research agent may recommend hypotheses, select datasets, identify baselines, tune models, run robustness checks, generate figures, draft text, and prepare reproducibility packages. That is not alien to recommender systems. It is a recommender system: recommending research actions under constraints, uncertainty, feedback, and multiple objectives. This also shows why generic “AI use” disclosure will become increasingly unhelpful. In such an AutoRecLab pipeline, AI is not a single tool that is switched on once and then reported in a footnote. It is part of the research environment. It may support dataset selection, baseline choice, experiment scheduling, code generation, hyperparameter optimization, figure creation, and text drafting, often through several interacting components. Disclosing that “AI was used” would be as informative as saying that “software was used.”

From AI Disclosure to Research Provenance

Our AutoRecLab discussion also shows why even ACM’s improved policy will only be an intermediate step. To be clear, ACM does not merely ask authors to write “AI was used.” The new policy is more serious than that. If AI is used in the research process itself, ACM asks authors to describe these uses in detail in the methods section. That is much better than a vague disclosure sentence.

However, my concern is that even this level of disclosure will soon be too small for the phenomenon it tries to describe. In a real AutoRecLab-style workflow, AI is not a single tool used at a single moment. It may be a chain of agents and systems that help define the research question, search the literature, assess novelty, select datasets, choose baselines, generate code, tune hyperparameters, run experiments, detect failed runs, summarize results, create figures, draft the manuscript, and revise the rebuttal. At that point, even a careful methods paragraph may become mostly useless. It will be like writing: “software was used to conduct the research.”

In a fully automatic AI Science pipeline, authors may not even know in detail how AI was used. They may know the input, the output, the system configuration, and the logs. But they may not know every internal decision that a research agent made. This is not entirely new. Most researchers do not know exactly how Excel performs every calculation, how Python packages implement every function, how a database engine rewrites a query, or how a numerical library handles floating-point edge cases. We usually trust these tools because they are mature, tested, widely used, and because we can inspect the inputs and outputs when needed. We are not yet there with AI research agents. Maybe we will never trust them in the same way we trust Excel or NumPy. But the direction is clear: as tools become more complex, disclosure must move from “which kind of tool was used?” to “what process produced this result, and can we inspect it?”

In our AutoRecLab paper, we therefore argued for the use of detailed research logs, metadata, and provenance records. These logs should document the actual research trajectory: the initial task description, the prompts and system settings, the generated research ideas, the papers retrieved or summarized, the novelty assessments, the datasets considered, the baselines selected or rejected, the experiment plans, the hyperparameter search spaces, the code changes, the failed runs, the discarded results, the generated figures, and the manuscript sections drafted or revised by AI. The aim is not to confess AI use. The aim is to make the research process auditable.

We also suggested comprehensive versioning of research artifacts, for instance through Git histories, timestamped metadata, information about the AI model and configuration, and containerized environments such as Docker or Singularity. If an AI agent changes code, selects a parameter, drops an experiment, or produces a figure, future readers should be able to inspect that chain of events.

We also proposed more structured mechanisms, such as a Research Attribution Markup Language, or RAML, to describe which parts of a manuscript were generated, edited, or suggested by AI, and a Research Process Markup Language, or RPML, to capture experiment setups, code versions, dataset versions, container images, hyperparameters, prompts, citation contexts, and other workflow metadata. This may sound bureaucratic. It probably will be, at least at first. But it is the kind of bureaucracy that can improve reproducibility, unlike a short AI-use paragraph that mainly reassures everyone that a box has been ticked.

So the future is not “AI was used,” and it is also not merely “AI was used in the following three ways.” The future is provenance. For simple AI assistance, a short explanation may be enough. For deeply AI-assisted research, especially AutoRecLab-style research, we will need research logs, archived artifacts, executable code, versioned prompts where they matter, documented search spaces, metadata, and clear human accountability. The key question should not be whether AI was used. In future research environments, of course it was. The key question should be whether the research process is transparent enough to be inspected, reproduced, and trusted.

The Downsides Are Real

The main downside is obvious: we will get more papers. Many will be bad. Some will be very bad. Some will be polished bad papers, which are worse because they waste more reviewer time before collapsing. Reviewers are already overloaded. AI-generated manuscripts will make this worse unless reviewers also receive strong AI support or we invent a different review system.

There is also a risk of homogenization. If many authors use similar tools, papers may become smoother but less original. Related work sections may become broader but shallower. Arguments may become more balanced but less brave. Academic writing already has a tendency to become oatmeal. AI may add milk.

But these arguments do not outweigh the benefits. Better tools can improve code, check citations, identify missing baselines, detect inconsistencies, and give early feedback to authors who otherwise submit weaker work. AI can reduce language barriers. It can help inexperienced researchers avoid avoidable mistakes. It can make research faster. It can free humans to spend more time on judgment, taste, theory, problem selection, and interpretation. That is the part of research I actually want humans to spend time on.

The Rule We Need: Accountability, Not Ritual Disclosure

The standard should be simple. Humans must be accountable. Claims must be supported. Code and data must be inspectable where possible. Citations must exist. Experiments must be reproducible. Review must be confidential and fair. If AI affects any of these, the artefact should reveal it through documentation, logs, code, metadata, or reproducibility material. But we should stop treating AI use itself as a moral category.

ACM’s revised policy is a step toward that future. I would go further, but I welcome the direction. ACM is saying that writing with AI is not the scandal. Submitting unreliable work is the scandal. That distinction is important.

Disclosure, Since We Are Talking About Disclosure

Of course, this blog post was mostly written by an AI, based on a comprehensive prompt, attached research papers, the ACM policy, a survey of related work, and multiple iterations for improvement. I take full responsibility for the content. The post reflects 100% my opinion.

The nice thing is that I could write this blog post in about 1 hour, including proofreading and thumbnail creation, instead of spending a full day or more writing it manually. That is not a bug. That is the point.

Tags:ACM, AI Agents, AI Policies, AI4Research, AI4Science, AutoRecLab, AutoRecSys, AutoResearch, AutoScience, Policy

About The Author

Joeran Beel

I am the founder of Recommender-Systems.com and head of the Intelligent Systems Group (ISG) at the University of Siegen, Germany https://isg.beel.org. We conduct research in recommender-systems (RecSys), personalization and information retrieval (IR) as well as on automated machine learning (AutoML), meta-learning and algorithm selection. Domains we are particularly interested in include smart places, eHealth, manufacturing (industry 4.0), mobility, visual computing, and digital libraries. We founded or maintain, among others, LensKit-Auto, Darwin & Goliath, Mr. DLib, and Docear, each with thousand of users; we contributed to TensorFlow, JabRef and others; and we developed the first prototypes of automated recommender systems (AutoSurprise and Auto-CaseRec) and Federated Meta Learning (FMLearn Server and Client).