I am Heming Xia (夏鹤明), a third-year Ph.D. student in the NLP Group at The Hong Kong Polytechnic University, advised by Prof. Wenjie Li. I received my master’s degree from the MOE Key Lab of Computational Linguistics at Peking University, where I was advised by Prof. Zhifang Sui. Before that, I earned my bachelor’s degree from the School of Physics at Peking University in 2020. I have also worked as a research intern at the NLC Group, Microsoft Research Asia and Sea AI Lab, where I was fortunate to collaborate with Dr. Tao Ge and Dr. Cunxiao Du. In Spring 2026, I visited the NLP Group at the University of California, San Diego, where I had the privilege of working with Prof. Julian McAuley. For more details, please see my CV.

📬 I am open to collaborating with highly motivated students on research related to (but not limited to) the topics below. If interested, please feel free to reach out via email.

Research

My research focuses on efficient and effective NLP, with the goal of making LLMs faster, more scalable, and broadly applicable. Specifically, my work centers on the following directions:

Speculative Decoding: Exploring inference acceleration techniques that maintain output fidelity. This includes our pioneering work on Speculative Decoding [EMNLP’23-findings, ICLR’25], the widely used benchmark Spec-Bench and the first comprehensive survey [ACL’24-findings] in this paradigm.
Efficient Reasoning: Developing advanced algorithms to enhance the efficiency of reasoning models, spanning efficient training strategies, inference acceleration [EMNLP’25, ACL’26], and dense representations such as latent CoT [arXiv’25].
Applications (Efficiency + X): I am interested in how efficiency-oriented techniques can benefit broader applications, with recent focus on tool-augmented models [arXiv’26] and multimodal models [EMNLP’25].

In addition, I am actively working on tool learning [e.g., EMNLP’24, ACL’25-findings] and vision-language understanding [e.g., ACL’22, EMNLP’23-findings, EMNLP’25-findings].

News

Apr 06, 2026	Got four papers accepted by ACL 2026 (1 Main+3 Findings)
Aug 21, 2025	Got three papers accepted by EMNLP 2025 (2 Main+1 Findings)
May 16, 2025	Got three papers accepted by ACL 2025 (1 Oral+2 Findings)
Jan 23, 2025	Got one paper accepted by ICLR 2025
Jan 19, 2025	Organized a tutorial on Speculative Decoding at COLING 2025

Selected Publications

(*) Equal Contribution. (†) Corresponding Author.

ACL

Merlin’s Whisper: Enabling Efficient Reasoning in LLMs via Black-box Persuasive Prompting

Heming Xia, Cunxiao Du^†, Rui Li, Chak Tou Leong, Yongqi Li^†, and Wenjie Li

In ACL, 2026

arXiv Code
EMNLP

TokenSkip: Controllable Chain-of-Thought Compression in LLMs

Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li^†, and Wenjie Li

In EMNLP, 2025

Best Oral @ PolyU HTML Code

Best Oral Presentation at PolyU COMP 2025 Research Student Conference.
EMNLP

SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning

Yicheng Ji^*, Jun Zhang^*, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, and Huan Li^†

In EMNLP, 2025

HTML Code
ACL

Towards Harmonized Uncertainty Estimation for Large Language Models

Rui Li, Jing Long, Muge Qi, Heming Xia, Lei Sha, Peiyi Wang, and Zhifang Sui^†

In ACL, Oral Presentation, 2025

HTML Code
Tutorial

Speculative Decoding for Efficient LLM Inference

Heming Xia, Yongqi Li, Cunxiao Du, Qian Liu, and Wenjie Li

In COLING, 2025

Video Slides Website
ICLR

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li^†, Jun Zhang, Cunxiao Du, and Wenjie Li

In ICLR, 2025

HTML Code
EMNLP

AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction

Hongru Wang^*, Rui Wang^*, Boyang Xue, Heming Xia, Jingtao Cao, Zeming Liu, Jeff Z. Pan^†, and Kam-Fai Wong^†

In EMNLP, 2024

HTML Code
EMNLP

A Survey on In-context Learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui

In EMNLP, 2024

HTML Paper List 机器之心
ACL

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding

Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, and Zhifang Sui

In Findings of ACL, 2024

HTML Code Paper List 机器之心
EMNLP

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

Heming Xia^*, Tao Ge^*†, Peiyi Wang, Si-Qing Chen, Furu Wei, and Zhifang Sui

In Findings of EMNLP, 2023

HTML Code