Neurips 2024 Workshop
Towards Safe & Trustworthy Agents

NeurIPS 2024 @ Vancouver, Canada
Dec 15, 2024
(in-person)

Submission Website

Overview

This workshop aims to clarify key questions on the safety of agentic AI systems and foster a community of researchers working in this area.

To this end, we have prepared a diverse and comprehensive schedule of speakers and organizers, to bring together experts in the field. We are also making a call for papers below, with submissions also accepted through the 'submit' button above.

Please note that the order of speakers in the schedule below is subject to change depending on speaker schedule and availability. We will have a confirmed speaker order closer to the time.

Call for papers deadline: September 14th, 2024

Schedule

Event
Description
Duration
Start Time
Opening Remark
Alex Pan
10mins
9:00 - 9:10
PT
Invited Talk 1
Anca Dragan (Google Deepmind and UC Berkeley)
30mins
9:10 - 10:40
PT
Invited Talk 2
David Bau (Northeastern)
30mins
9:40 - 10:10
PT
Contributed Talks
Submitted author
40mins
10:10 - 10:50
PT
Coffee Break
25mins
10:50 - 11:15
PT
Invited Talk 3
Been Kim (Google Deepmind)
30mins
11:15 - 11:45
PT
Live Poster Session 1
Submitted author
45mins
11:45 - 12:30
PT
Lunch
1hr
12:30 - 1:30
PT
Invited Talk 4
David Krueger (Cambridge)
30mins
1:30 - 2:00
PT
Invited Talk 5
Daniel Kang (UIUC)
30mins
2:00 - 2:30
PT
Invited Talk 6
Yu Su (Ohio State)
30mins
2:30 - 3:00
PT
Live Poster Session 2
Submitted author
45mins
3:00 - 3:45
PT
Coffee Break
15mins
3.45 - 4:00
PT
Panel Discussion and Reflection
Speakers and Organizers
55mins
4:00 - 4:55
PT
Closing Remarks
Alex Pan
5mins
4:55 - 5:00
PT

The schedule above is tentative and the order of speakers could change.

Call for Papers

Foundation models are increasingly being augmented with new modalities and access to a variety of tools and software [9, 5, 3, 11]. Systems that can take action in a more autonomous manner have been created by assembling agent architectures or scaffolds that include basic forms of planning and memory, such as ReAct [12], RAISE [7], Reflexion [10], and AutoGPT + P [2], or multi-agent architectures such as DyLaN [8] and AgentVerse [4]. As these systems are made more agentic, this could unlock a wider range of beneficial use-cases, but also introduces new challenges in ensuring that such systems are trustworthy [6]. Interactions between different autonomous systems create a further set of issues around multi-agent safety [1]. The scope and complexity of potential impacts from agentic systems means that there is a need for proactive approaches to identifying and managing their risks. Our workshop will surface and operationalize these questions into concrete research agendas.

This workshop aims to clarify key questions on the trustworthiness of agentic AI systems and foster a community of researchers working in this area. We welcome papers on topics including, but not limited to, the following:


Research into safe reasoning and memory. We are interested in work that makes LLM agent
reasoning or memory trustworthy, e.g., by preventing hallucinations or mitigating bias.
Research into adversarial attacks, security and privacy for agents. As LLM agents interact
with more data modalities and a wider variety of input/output channels, we are interested in work
that studies or defends against possible threats and privacy leaks.
Research into controlling agents. We are interested in novel control methods which specify goals,
constraints, and eliminate unintended consequences in LLM agents.
Research into agent evaluation and accountability. We are interested in evaluation for LLM
agents (e.g., automated red-teaming) and interpretability + attributability of LLM agent actions.
Research into environmental and societal impacts of agents. We are interested in research that
examines the environmental cost, fairness, social influence, and economic impacts of LLM agents.
Research into multi-agent safety and security. We are interested in research that analyzes novel
phenomena with multiple agents: emergent functionality at a group level, collusion between agents, correlated failures, etc.

Submission Guide

  • Submission site: Submissions should be made on OpenReview.
  • Submissions are non-archival: we receive submissions that are also undergoing peer review elsewhere at the time of submission, but we will not accept submissions that have already been previously published or accepted for publication at peer-reviewed conferences or journals. Submission is permitted for papers presented or to be presented at other non-archival venues (e.g. other workshops). No formal workshop proceedings will be published.
  • Social Impact Statement: authors are required to include a "Social Impact Statement" that highlights "potential broader impact of their work, including its ethical aspects and future societal consequences".
  • Submission Length and Format: Submissions should be anonymised papers up to 5 pages (appendices can be added to the main PDF); excluding references and Social Impacts Statement. You must format your submission using the NeurIPS 2024 style file.
  • Paper Review: All reviews are double-blinded, with at least two reviewers assigned to each paper.
  • Camera Ready Instructions: The camera ready version is composed of a main body, which can be up to 6 pages long, followed by unlimited pages for a Social Impact Statement, references and an appendix, all in a single file. The camera-ready versions of all accepted submissions should be uploaded by the authors to the OpenReview page for corresponding submissions. The camera ready version will be publicly available to everyone on the Camera-Ready Deadline displayed below.

Key Submission Dates:

  • Submissions Open:
  • July 31st, 2024
  • Suggested deadline for submissions:
  • September 14th, 2024
  • Accept/Reject Notification Date:
  • October 14th, 2024

Speakers and Panelists

Anca Dragan

Google Deepmind & Associate Professor, UC Berkeley

Been Kim

Senior Staff Research Scientist, Google Deepmind

David Bau

Assistant Professor, Northeastern

Yu Su

Associate Professor, Ohio State

David Krueger

Associate Professor, Cambridge

Daniel Kang

Associate Professor, UIUC

Workshop Organizers

Dawn Song

Professor, UC Berkeley

Bo Li

Associate Professor, University of Chicago

Karthik Narasimhan

Associate Professor, Princeton

Kimin Lee

Associate Professor, KAIST

Alexander Pan

PhD Candidate (3rd year), UC Berkeley CS

Isabelle Barrass

Project Manager, Center for AI Safety

Frequently Asked Questions

Can we submit a paper that will also be submitted to NeurIPS 2024?

Will the reviews be made available to authors?

Can I submit a paper that has been submitted to another ML conference?

I have a question not answered here, who should I reach out to?

References:

  1. Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase,
    Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, et al. Foundational
    challenges in assuring alignment and safety of large language models. arXiv preprint
    arXiv:2404.09932, 2024.
  2. Timo Birr, Christoph Pohl, Abdelrahman Younes, and Tamim Asfour. Autogpt+ p: Affordance-based task planning with large language models. arXiv preprint arXiv:2402.10778, 2024.
  3. Daniil A Boiko, Robert MacKnight, and Gabe Gomes. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332, 2023.
  4. Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chen Qian, Chi-Min Chan, Yujia Qin, Yaxi Lu, Ruobing Xie, et al. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848, 2023.
  5. Tom Davidson, Jean-Stanislas Denain, Pablo Villalobos, and Guillem Bas. Ai capabilities
    can be significantly improved without expensive retraining. arXiv preprint arXiv:2312.07413,
    2023.
  6. Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan
    Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, et al. The ethics of
    advanced ai assistants. arXiv preprint arXiv:2404.16244, 2024.
  7. Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, and Ming Cui. From llm to
    conversational agent: A memory enhanced architecture with fine-tuning of large language
    models. arXiv preprint arXiv:2401.02777, 2024.
  8. Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. Dynamic llm-agent net-
    work: An llm-agent collaboration framework with agent team optimization. arXiv preprint arXiv:2310.02170, 2023.
  9. Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36, 2024.
  10. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Re-flexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  11. Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan,
    and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  12. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.