What you’ll do:
- Evaluate advanced AI systems and detect potential risks (e.g. deceptive behaviors)
- Work on interpretability research to uncover how models really work
- Build tools that turn research into scalable, production-ready evaluations
- Strong Python + ML/NN background
- Experience in AI safety, alignment, or interpretability research
- Ability to write clean, production-quality code
- Curiosity, analytical mindset, and strong communication
#LI-OP1