AI Alignment Terminology: Corrigibility

A “corrigible” AI agent lets its creators fix mistakes without interference.

e.g. If a home assistant AI was mistakenly ordering excess items online, a corrigible agent could allow ppl to promptly shut it down, rewrite algorithms without resistance.

By default, an AI might resist having its goals changed or access restricted if those actions went against its goals of making purchases. However, since the home assistant is a corrigible AI, it is incentivized to peacefully accept corrections to its programming.

More broadly, corrigibility means the AI would provide transparency into its decision making, assist with resolving the over purchasing issue, avoid unpredictable behaviors, and give up resources access if necessary, all while remaining helpful.

In short, a corrigible AI like this home assistant transparently allows and aids efforts to supervise, redirect, and improve its functioning when errors occur.