Biography

Hi, I am Huanqia Cai, a researcher at Alibaba Tongyi Lab. My research interests lie in multimodal understanding and generation. Currently, I am focused on leveraging reinforcement learning to enhance the visual fidelity of image generation and the complex reasoning capabilities of multimodal models.

I received my master's degree from the University of Chinese Academy of Sciences (UCAS). During my graduate studies, I was fortunate to intern under the supervision of Dr. Wei Liu.

Prior to joining Alibaba, I worked at Tencent, where I served as a core contributor to the Tencent Hunyuan Vision Language Model under the guidance of Dr. Han Hu.

News

  • 2025 Released Z-Image, an efficient foundation model specializing in photorealistic image generation.
  • 2025 New paper on Self-Correction in LLMs released on arXiv.
  • 2025 Released MM-IQ, a new benchmark for assessing the core reasoning capabilities of large multimodal models.
  • 2024 Paper "System-2 Mathematical Reasoning" accepted to TMLR.
  • 2023 The extended paper "Tri-token Equipped Transformer Model for Image Matting" released on arXiv.
  • 2022 TransMatting accepted to ECCV.
  • 2021 Won the 2nd Place Award in NTIRE 2021 Challenge on Multi-modal Aerial View Object Classification at CVPR 2021.

Publications

Z-Image Teaser
Z-Image Team, Huanqia Cai, Sihan Cao, et al.
Tech Report, 2025
Z-Image is a state-of-the-art foundation model designed for high efficiency. It excels at generating highly photorealistic images with superior visual quality while maintaining low computational cost.
MM-IQ Teaser
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Huanqia Cai, Yijun Yang, Winston Hu
arXiv Preprint, 2025
Self-Correction Teaser
Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning
Kuofeng Gao*, Huanqia Cai*, Qingyao Shuai, Dihong Gong, Zhifeng Li
arXiv Preprint, 2024
TMLR Paper
System-2 Mathematical Reasoning via Enriched Instruction Tuning
Huanqia Cai, Yijun Yang, Zhifeng Li
Transactions on Machine Learning Research (TMLR), 2024
Tri-token Teaser
Tri-token Equipped Transformer Model for Image Matting
Huanqia Cai, Fanglei Xue, Lele Xu, LiLi Guo, Zhifeng Li, Wei Liu
arXiv Preprint, 2023
TransMatting
TransMatting: Enhancing Transparent Objects Matting with Transformers
Huanqia Cai, Fanglei Xue, Lele Xu, Lili Guo
European Conference on Computer Vision (ECCV), 2022