Publications

2026

ICML’26

WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving

Chiheng Lou, Sheng Qi, Rui Kang, and 6 more authors

In 43rd International Conference on Machine Learning (ICML 26) (to appear), 2026

PDF
NSDI’26

HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds

Chiheng Lou, Sheng Qi, Chao Jin, and 5 more authors

In 23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26), 2026

PDF

2025

TON

Efficient Far Memory-Aware Scheduling With FaMAS

Chiheng Lou and Xin Jin

IEEE Transactions on Networking, 2025

HTML

2024

ASPLOS’24

SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers

Daliang Xu, Mengwei Xu, Chiheng Lou, and 4 more authors

In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 24), 2024

PDF
TMC

Efficient, Scalable, and Sustainable DNN Training on SoC-Clustered Edge Servers

Mengwei Xu, Daliang Xu, Chiheng Lou, and 4 more authors

IEEE Transactions on Mobile Computing, 2024

HTML