Constitutional AI: Harmlessness from AI Feedback

2026-01-29•Reference_R-01-1

Source

Bai et al. — Constitutional AI: Harmlessness from AI Feedback

Anthropic's Constitutional AI approach uses a set of principles (a 'constitution') to guide model behavior without relying solely on human feedback. Key insight: AI can critique and revise its own outputs based on ethical principles.

#AI Safety

#RLHF

#Anthropic

#Alignment

2026-01-29