Machine Learning Engineer interested in alignment and interpretability research. Previously at Replika, where I worked on post-training and safety alignment for production LLMs, introduced DPO to the training pipeline, and co-led the redesign of the conversation system. Here's my CV.
My main interest is the intersection of mechanistic interpretability and alignment - using interp to verify whether alignment techniques work internally, not just behaviorally. Especially relevant for cases like deception detection, where behavioral signals are unreliable by definition.
I keep a log of ML notes and annotated implementations.
Outside of work, I play guitar (telecaster) and take photos of random stuff.