Heming Xia

he-ming.xia AT connect.polyu.hk

prof_pic.jpg

I am Heming Xia (夏鹤明), a Ph.D. student in the NLP Group at The Hong Kong Polytechnic University, supervised by Prof. Wenjie Li. I earned my master’s degree from the MOE Key Lab of Computational Linguistics at Peking University, advised by Prof. Zhifang Sui. Prior to that, I completed my bachelor’s degree from the School of Physics at Peking University in 2020. I have also worked as a Research Intern at the NLC Group @ Microsoft Research Asia, where I had the privilege of collaborating with Dr. Tao Ge. Currently, I am an intern at SEA AI Lab, working closely with Dr. Cunxiao Du. For more details, please see my CV.

📬 I am open to collaborating with highly motivated students on research related to (but not limited to) the topics below. If interested, please feel free to reach out via email.

Research

My research focuses on efficient and effective NLP, with the goal of making LLMs faster, more scalable, and broadly applicable. Specifically, my work centers on the following directions:

  • Speculative Decoding: Exploring inference acceleration techniques that maintain output fidelity. This includes our pioneering work on Speculative Decoding [EMNLP’23-findings, ICLR’25], the widely used benchmark Spec-Bench and the first comprehensive survey [ACL’24-findings] in this paradigm.
  • Efficient Reasoning: Developing advanced algorithms to enhance the efficiency of reasoning models, spanning efficient training strategies, inference acceleration [EMNLP’25, arXiv’25], and dense representations such as latent CoT [arXiv’25].
  • Applications (Efficiency + X): I am interested in how efficiency-oriented techniques can benefit broader applications, with recent focus on tool-augmented agents and multimodal models [EMNLP’25].

In addition, I am actively working on tool learning [e.g., EMNLP’24, ACL’25-findings] and vision-language understanding [e.g., ACL’22, EMNLP’23-findings, EMNLP’25-findings].

News

Aug 21, 2025 Got three papers accepted by EMNLP 2025 (2 Main+1 Findings) :tada:
May 16, 2025 Got three papers accepted by ACL 2025 (1 Oral+2 Findings)​ :tada:
Jan 23, 2025 Got one paper accepted by ICLR 2025 :tada:
Jan 19, 2025 Organized a tutorial on Speculative Decoding at COLING 2025 :hugs:
Sep 21, 2024 Got four papers accepted by EMNLP 2024 (2 Main+2 Findings) :tada:

Selected Publications

(*) Equal Contribution. (†) Corresponding Author.
  1. TokenSkip: Controllable Chain-of-Thought Compression in LLMs
    Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, and Wenjie Li
    In EMNLP, 2025
  2. SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
    Yicheng Ji*, Jun Zhang*, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, and Huan Li
    In EMNLP, 2025
  3. ACL
    cue.jpg
    Towards Harmonized Uncertainty Estimation for Large Language Models
    Rui Li, Jing Long, Muge Qi, Heming Xia, Lei Sha, Peiyi Wang, and Zhifang Sui
    In ACL (Oral), 2025
  4. Tutorial
    tutorial.jpg
    Speculative Decoding for Efficient LLM Inference
    Heming Xia, Yongqi Li, Cunxiao Du, Qian Liu, and Wenjie Li
    In COLING, 2025
  5. SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
    Heming Xia, Yongqi Li, Jun Zhang, Cunxiao Du, and Wenjie Li
    In ICLR, 2025
  6. AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
    Hongru Wang*, Rui Wang*, Boyang Xue, Heming Xia, Jingtao Cao, Zeming Liu, Jeff Z. Pan, and Kam-Fai Wong
    In EMNLP, 2024
  7. A Survey on In-context Learning
    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui
    In EMNLP, 2024
  8. ACL
    specsurvey.jpg
    Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
    Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, and Zhifang Sui
    In Findings of ACL, 2024
  9. Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
    Heming Xia*, Tao Ge*†, Peiyi Wang, Si-Qing Chen, Furu Wei, and Zhifang Sui
    In Findings of EMNLP, 2023