Introduction to ML Safety Spring 2023

An introduction to safety topics for students with a background in Deep Learning.

Apply by January 29th 2023


In recent years, the machine learning community has made substantial progress on a wide variety of problems: competition mathematics, colorizing images, protein folding, superhuman poker, art generation, etc. This has led to the increased adoption of machine learning systems in high-stakes settings like medicine or self-driving and has raised a host of safety concerns. Some of these concerns apply to current systems: how do we prevent driverless cars from mis-identifying a stop sign in a blizzard? Others are more forward-looking: how can we ensure general AI systems pursue safe and beneficial goals? This course serves as an introduction to the body of technical research relevant to both but emphasizes future, high-consequence risks: could future AI systems pose an existential threat?

As with other powerful technologies, safety for machine learning should be a leading research priority. In this spirit, we want to bring you to the frontiers of this nascent field. The course materials are created by Dan Hendrycks, a UC Berkeley ML PhD and the director of the Center for AI Safety. The program will be virtual by default, though students may be able to attend in-person discussion groups at some universities.


The course covers:

  1. Hazard Analysis: an introduction to concepts from the field of hazard analysis and how they can be applied to ML systems; and an overview of standard models for modelling risks and accidents.
  2. Robustness: Robustness focuses on ensuring models behave acceptably when exposed to abnormal, unforeseen, unusual, highly impactful, or adversarial events. We cover techniques for generating adversarial examples and making models robust to adversarial examples; benchmarks in measuring robustness to distribution shift; and approaches to improving robustness via data augmentation, architectural choices, and pretraining techniques.
  3. Monitoring: We cover techniques to identify malicious use, hidden model functionality and data poisoning, and emergent behaviour in models; metrics for OOD detection; confidence calibration for deep neural networks; and transparency tools for neural nets.
  4. Alignment: We define alignment as reducing inherent model hazards. We cover measuring honesty in models; power aversion; an introduction to ethics; and imposing ethical constraints in ML systems.
  5. Systemic Safety: In addition to directly reducing hazards from AI systems, there are several ways that AI can be used to make the world better equipped to handle the development of AI by improving sociotechnical factors like decision making ability and safety culture. We cover using ML for improved epistemics; ML for cyberdefense;  and ways in which AI systems could be made to better cooperate.
  6. Additional X-Risk Discussion: The last section of the course explores the broader importance of the concepts covered: namely, existential risk and possible existential hazards. We cover specific ways in which AI could potentially cause an existential catastrophe, such as weaponization, proxy gaming, treacherous turn, deceptive alignment, value lock-in, and persuasive AI. We introduce some considerations for influencing future AI systems; and introduce research on selection pressures.

Time Commitment and Stipend

The program will last 8 weeks, beginning on February 20th and ending on April 21th. Participants are expected to commit at least 5 hours per week. This includes ~1 hour of recorded lectures, ~1-2 hours of readings, ~1-2 hours of written assignments, and 1 hour of discussion.

We understand that 5 hours is a large time commitment, so to make our program more inclusive and remove any financial barriers, we will provide a $500 stipend upon completion of the course.


The material is designed for students with strong technical backgrounds. The prerequisites are:

  • Deep Learning (courses and/or prior research experience preferred)
  • At least one of linear algebra or introductory statistics (e.g., AP Statistics)
  • Multivariate Differential Calculus

If you do not meet the prerequisites or are not sure if you meet the prerequisites, feel free to still apply. We will review applicants on a case-by-case basis.

Apply by January 29th