Spatio-temporal graph dual-attention network for multi-agent prediction and tracking J Li, H Ma, Z Zhang, J Li, M Tomizuka IEEE Transactions on Intelligent Transportation Systems 23 (8), 10556-10569, 2021 | 160* | 2021 |
Specinfer: Accelerating large language model serving with tree-based speculative inference and verification X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 157 | 2024 |
Towards efficient generative large language model serving: A survey from algorithms to systems X Miao, G Oliaro, Z Zhang, X Cheng, H Jin, T Chen, Z Jia arXiv preprint arXiv:2312.15234, 2023 | 58 | 2023 |
Gradsign: Model performance inference with theoretical insights Z Zhang, Z Jia arXiv preprint arXiv:2110.08616, 2021 | 29 | 2021 |
Accelerating retrieval-augmented language model serving with speculation Z Zhang, A Zhu, L Yang, Y Xu, L Li, PM Phothilimthana, Z Jia arXiv preprint arXiv:2401.14021, 2024 | 5 | 2024 |
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models Z Zhang, D Zhao, X Miao, G Oliaro, Q Li, Y Jiang, Z Jia arXiv preprint arXiv:2401.07159, 2024 | 4 | 2024 |
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention L Yang, Z Zhang, Z Chen, Z Li, Z Jia arXiv preprint arXiv:2410.05076, 2024 | | 2024 |