Aligning AI with Shared Human Values

We show how to assess a language model’s knowledge of basic concepts of
morality. We introduce the ETHICS dataset, a new benchmark that spans concepts
in justice, well-being, duties, virtues, and commonsense morality. Models
predict widespread moral judgm… Read more

Similar