Следене
Gabriele Oliaro
Заглавие
Позовавания
Позовавания
Година
SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification
X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ...
arXiv preprint arXiv:2305.09781, 2023
46*2023
Towards efficient generative large language model serving: A survey from algorithms to systems
X Miao, G Oliaro, Z Zhang, X Cheng, H Jin, T Chen, Z Jia
arXiv preprint arXiv:2312.15234, 2023
182023
Zero-CPU Collection with Direct Telemetry Access
J Langlet, RB Basat, S Ramanathan, G Oliaro, M Mitzenmacher, M Yu, ...
ACM Workshop on Hot Topics in Networks (HotNets '21), 108–115, 2021
102021
Direct Telemetry Access
J Langlet, R Ben Basat, G Oliaro, M Mitzenmacher, M Yu, G Antichi
ACM SIGCOMM 2023 Conference, 832-849, 2023
22023
Quantized side tuning: Fast and memory-efficient tuning of quantized large language models
Z Zhang, D Zhao, X Miao, G Oliaro, Q Li, Y Jiang, Z Jia
arXiv preprint arXiv:2401.07159, 2024
12024
Optimal Kernel Orchestration for Tensor Programs with Korch
M Hu, A Venkatram, S Biswas, B Marimuthu, B Hou, G Oliaro, H Wang, ...
Proceedings of the 29th ACM International Conference on Architectural …, 2024
2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
X Miao, G Oliaro, X Cheng, M Wu, C Unger, Z Jia
arXiv preprint arXiv:2402.18789, 2024
2024
Системата не може да изпълни операцията сега. Опитайте отново по-късно.
Статии 1–7