Xun Huang's Academic Homepage

I am a Technical Director at Roblox, where I focus on Roblox Reality, the hybrid generative AI architecture powering the next generation of multiplayer photorealistic gaming experiences. Previously, I was the Founder and CEO of Morpheus AI, which was acquired by Roblox. Before that, I held roles as a Research Scientist at Adobe Research, an Adjunct Professor at CMU, and a Research Scientist at NVIDIA.

My work has helped shape the foundations of modern video world models. I pioneered key architectures and algorithms such as Self Forcing and Autoregressive Video Diffusion Transformers (in CausVid). Earlier, I developed one of the first public text-to-image product (GauGAN2), as well as NVIDIA's first text-to-image and text-to-3D foundation models. My research has been cited over 19,000 times, including more than 14,000 from projects that I led.

I have been working on multimodal "Generative AI" for 10 years. I obtained my PhD in Computer Science from Cornell in 2020, advised by Professor Serge Belongie. During my PhD, I invented Adaptive Instance Normalization (AdaIN) and was the first to demonstrate its effectiveness in generative neural networks. AdaIN became a foundational component of StyleGAN and played a key role in the first working diffusion model. Variants of AdaIN are now used in nearly all diffusion models. My PhD research was supported by Adobe Research Fellowship (2019), Snap Research Fellowship (2019), and NVIDIA Graduate Fellowship (2018).

Selected/Recent Publications

MonarchRT: Efficient Attention for Real-Time Video Generation

arXiv 2026

Krish Agarwal, Zhuoming Chen, Cheng Luo, Yongqi Chen, Haizhong Zheng, Xun Huang, Atri Rudra, Beidi Chen

[arXiv] [Project] [Code]

Causality in Video Diffusers is Separable from Denoising

CVPR 2026

Xingjian Bai, Guande He, Zhengqi Li, Eli Shechtman, Xun Huang, Zongze Wu

[arXiv]

MotionStream: Real-Time Video Generation with Interactive Motion Controls

ICLR 2026 (oral)

Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Shechtman, Xun Huang

[arXiv] [Project]

Learning an Image Editing Model without Image Editing Pairs

ICLR 2026

Nupur Kumari, Sheng-Yu Wang, Nanxuan Zhao, Yotam Nitzan, Yuheng Li, Krishna Kumar Singh, Richard Zhang, Eli Shechtman, Jun-Yan Zhu, Xun Huang

[arXiv] [Project] [Code]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

NeurIPS 2025 (spotlight)

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, Eli Shechtman

[arXiv] [Project] [Code]

Long-Context State-Space Video World Models

ICCV 2025

Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, Xun Huang

[arXiv] [Project]

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

CVPR 2025

Tianwei Yin*, Qiang Zhang*, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, Xun Huang

[arXiv] [Project] [Code]

Magic3D: High-Resolution Text-to-3D Content Creation

CVPR 2023 (Highlight)

Chen-Hsuan Lin*, Jun Gao*, Luming Tang*, Towaki Takikawa*, Xiaohui Zeng*, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

[arXiv] [Project] [Video]

eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers

arXiv 2022

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu

[arXiv] [Project] [Video]

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

ECCV 2022

Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu

[arXiv] [Project] [Video] [Two Minute Papers]

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

ICCV 2019 (Oral)

Guandao Yang*, Xun Huang*, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan

[arXiv] [Code] [Video]

Multimodal Unsupervised Image-to-Image Translation

ECCV 2018

Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz

[arXiv] [Code] [Video]

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

ICCV 2017 (Oral)

Xun Huang, Serge Belongie

[arXiv] [Code]

Stacked Generative Adversarial Networks

CVPR 2017

Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, Serge Belongie

[arXiv] [Code]

* indicates equal contribution.
See Google Scholar for the full list of publications.

Xun Huang (/shuun hwang/)

Selected/Recent Publications

MonarchRT: Efficient Attention for Real-Time Video Generation

Causality in Video Diffusers is Separable from Denoising

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Learning an Image Editing Model without Image Editing Pairs

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Long-Context State-Space Video World Models

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Magic3D: High-Resolution Text-to-3D Content Creation

eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

Multimodal Unsupervised Image-to-Image Translation

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

Stacked Generative Adversarial Networks

Teaching

Blog

Student mentees/interns