Neural networks (or any learned model) may often behave differently from the intentions of the programmers. A useful distinction between the failure modes is between outer and inner misalignment.
The Inner Alignment Problem
The Inner Alignment Problem
The Inner Alignment Problem
Neural networks (or any learned model) may often behave differently from the intentions of the programmers. A useful distinction between the failure modes is between outer and inner misalignment.