Neural networks (or any learned model) may often behave differently from the intentions of the programmers. A useful distinction between the failure modes is between outer and inner misalignment.
Share this post
The Inner Alignment Problem
Share this post
Neural networks (or any learned model) may often behave differently from the intentions of the programmers. A useful distinction between the failure modes is between outer and inner misalignment.