Yixiong Hao

Hi, I'm Yixiong.

I work on technical research and strategy for making advanced artificial intelligence go well, and study computer science at Georgia Tech.

I want to reduce the potential of catastrophic risks from advanced AI systems. I'm interested in understanding what and how LLMs learn so that we can make effective use of finite human oversight. I act as the co-director of the Georgia Tech AI Safety Initiative, where I lead an amazing team to produce impactful research and help more people work on AI safety. In the past, I have worked on activation engineering, interpretability, demo-ed AI jailbreaks at a congressional exhibition, and co-authored a RFI response to America's National AI Action plan.

Outside of work, I used to play competitive golf and table tennis, enjoy badminton, climbing, and checking out cafes.

Research

I'm currently working on agentic misalignment in RL with CAIS, and interpretability of VLA models with the PAIR Lab. I'm thinking about a mixture of ideas in LLM generalization, interpretability, and oversight. I want to develop 'bitter lesson pilled' techniques in these subfields that effectively leverage computation, perhaps within LLMs, to work better as LLMs become smarter. Research blogs and unpublished work is under writing.

I've been fortunate enough to work with many mentors who have helped me grow. In no particular order, Mantas Mazeika, Professor Animesh Garg, Professor Kartik Goyal, and Sheikh Abdur Raheem Ali

Patterns and Mechanisms of Contrastive Activation Engineering TLDR

Y. Hao, A. Panda, S. Shabalin, S. A. R. Ali

ICLR 2025 Human-AI Co-evolution Workshop + 2 others

Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning TLDR

S. Shabalin, A. Panda, D. Kharlapenko, A. R. Ali, Y. Hao, A. Conmy

CVPR 2025 XIV Workshop

Shall We Play a Game? Language Models for Open-ended Wargames TLDR

G. Matlin, P. Mahajan, I. Song, Y. Hao, R. Bard, S. Topp, E. Montoya, M. R. Parwani, S. Shetty, M. Riedl

EMNLP 2025 Wordplay Workshop

Applications of Artificial Intelligence in Golf

Y. Hao

ICAIE 2022