ML Safety Social
December 9th, 2022. Virtual.


Designing systems to operate safely in real-world settings is a topic of growing interest in machine learning. As ML becomes more capable and widespread, long-term and long-tail safety risks will grow in importance. To make the adoption of ML more beneficial, various aspects of safety engineering and oversight need to be proactively addressed by the research community.

This workshop will bring together researchers from machine learning communities to focus on research topics in Robustness, Monitoring, Alignment, and Systemic Safety.

  • Robustness is designing systems to be reliable in the face of adversaries and highly unusual situations.
  • Monitoring is detecting anomalies, malicious use, and discovering unintended model functionality.
  • Alignment is building models that represent and safely optimize difficult-to-specify human values.
  • Systemic Safety is using ML to address broader risks related to how ML systems are handled, such as cyberattacks, facilitating cooperation, or improving the decision-making of public servants.

For more information about these problem categories or to submit, visit the call for papers page. We will award a total of $100,000 in paper prizes described below. Paper prize winners will be announced during closing remarks. For questions contact

NeurIPS site: To attend, register for a NeurIPS virtual pass.

Best Paper Awards ($50,000)

There is a $50,000 award pool for the best ML Safety papers accepted to this workshop. We highly encourage submissions in all areas of ML safety, spanning robustness, monitoring, alignment, and systemic safety. The award pool will be divided between 5 to 10 winning papers.


Adversarial Robustness:



Systemic Safety:

Paper Analysis Awards ($50,000)

This workshop is kindly sponsored by Open Philanthropy, which is also offering awards for accepted papers that provide analysis of how their work relates to the sociotechnical issues posed by AI. This $50,000 award pool is separate from the $50,000 best paper award pool.

AI safety analysis could be included in the appendix and does not need to be within the main paper. We intend to award papers that exceed a quality threshold in their safety analyses, so many papers that make an honest effort can win a portion of the $50,000 prize pool. Quality safety analyses must relate to some concepts, considerations, or strategies for reducing sociotechical issues from AI. Paper analyses do not need to follow any existing template.


The speakers for the invited and contributed talks will be added when these details are finalized. Recipients of the 'top paper' prizes will have the opportunity to give talks (which is what the 'contributed talk' slots are reserved for). The schedule is subject to change.

Times indicated are in Central Time (CT).

  • 9:00am - 9:10am Opening remarks
  • 9:10am - 9:40am Sharon Li: How to Handle Distributional Shifts? Challenges, Research Progress and Future Directions
  • 9:40am - 10:25am Morning Poster Session
  • 10:25 - 10:45am Coffee Break
  • 10:45 - 11:15am Bo Li: Trustworthy Machine Learning via Learning with Reasoning
  • 11:15am - 12:00pm Afternoon Poster Session
  • 12:00pm - 12:45pm Lunch
  • 12:45 - 1:15pm Dorsa Sadigh: Aligning Robot Representations with Humans
  • 1:15pm - 1:45pm David Krueger: Sources of Specification Failure
  • 1:45pm - 2:00pm Coffee Break
  • 2:00pm - 2:30pm David Bau: Direct model editing: a framework for understanding model knowledge
  • 2:30pm - 3:00pm Sam Bowman: What's the deal with AI safety?
  • 3:00pm - 3:55pm Live Panel discussion with speakers
  • 3:55pm - 4:00pm Closing remarks

Important dates

  • Submission Deadline: Wednesday, Oct 5th AOE (updated)
  • Workshop: Friday, December 9th.



  • Adversaries: How can we create models that resist adversarial attacks, including attacks that are beyond the lp ball, perceptible, or unforeseen?
  • Long Tails: How can we create models that do not misgeneralize or adapt in the face of long-tail events, feedback loops, and unknown unknowns?


  • Anomaly Detection: How can we detect anomalous and malicious use, and how should these events be handled once they are detected?
  • Calibration: How can we create models that are calibrated and accurately and honestly represent their beliefs?
  • Hidden Model Functionality: How can we detect whether models have unexpected latent functionality, such as backdoors or emergent capabilities?


  • Specification: How can we teach ML models complex human values?
  • Optimization: How do we train agents to optimize for goals that integrate human values and how do we prevent agents from pursuing unintended instrumental goals?
  • Brittleness: How can we prevent ML systems from gaming proxy objectives?
  • Unintended Consequences: How can we design proxy objectives that do not create unintended feedback loops, do not create unwanted instrumental goals, and do not incentivize irreversible actions?

Systemic Safety

  • ML for cyber security: how can ML be used to patch insecure code or detect cyberattacks?
  • Informed decision making: how can ML be used to forecast events and raise crucial considerations?

We encourage submissions to demonstrate evidence of scalability. In particular, we encourage submissions that can improve the safety of large-scale ML models and can plausibly improve or apply to future larger versions of these models.

To read more about these categories, refer to Unsolved Problems in ML Safety.

Submission Instructions

Submission link:

The recommended paper length is 4 pages. Submissions may include supplementary material, but reviewers aren't required to read after 4 pages. The references can take as many pages as necessary and do not count towards the 4-page limit. Submissions should be in PDF format. Please use these Latex style files. The reviewing process is double-blind, so the submissions should be anonymized and not contain information that could identify the authors. If the authors' work has already been published in a journal, conference, or workshop, their submission should meaningfully extend their previous work. However, parallel submission (to a journal, conference, workshop, or preprint repository) is allowed. If your paper is accepted, you will be invited to present a poster at the workshop and may also be invited to give a talk. Accepted submissions will be shown on the workshop website.