Lora: Low-rank adaptation of large language models EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen
arXiv preprint arXiv:2106.09685, 2021
11044 2021 Deberta: Decoding-enhanced bert with disentangled attention P He, X Liu, J Gao, W Chen
arXiv preprint arXiv:2006.03654, 2020
3015 2020 On the variance of the adaptive learning rate and beyond L Liu, H Jiang, P He, W Chen, X Liu, J Gao, J Han
arXiv preprint arXiv:1908.03265, 2019
2491 2019 Multi-task deep neural networks for natural language understanding X Liu, P He, W Chen, J Gao
arXiv preprint arXiv:1901.11504, 2019
1521 2019 What Makes Good In-Context Examples for GPT- ? J Liu, D Shen, Y Zhang, B Dolan, L Carin, W Chen
arXiv preprint arXiv:2101.06804, 2021
1315 2021 Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing P He, J Gao, W Chen
arXiv preprint arXiv:2111.09543, 2021
1073 2021 Phi-3 technical report: A highly capable language model locally on your phone M Abdin, J Aneja, H Awadalla, A Awadallah, AA Awan, N Bach, A Bahree, ...
arXiv preprint arXiv:2404.14219, 2024
792 2024 Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization H Jiang, P He, W Chen, X Liu, J Gao, T Zhao
arXiv preprint arXiv:1911.03437, 2019
515 2019 AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning Q Zhang, M Chen, A Bukharin, N Karampatziakis, P He, Y Cheng, ...
arXiv preprint arXiv:2303.10512, 2023
470 2023 Check your facts and try again: Improving large language models with external knowledge and automated feedback B Peng, M Galley, P He, H Cheng, Y Xie, Y Hu, Q Huang, L Liden, Z Yu, ...
arXiv preprint arXiv:2302.12813, 2023
425 2023 Agieval: A human-centric benchmark for evaluating foundation models W Zhong, R Cui, Y Guo, Y Liang, S Lu, Y Wang, A Saied, W Chen, ...
arXiv preprint arXiv:2304.06364, 2023
351 2023 Reasonet: Learning to stop reading in machine comprehension Y Shen, PS Huang, J Gao, W Chen
Proceedings of the 23rd ACM SIGKDD international conference on knowledge …, 2017
339 2017 Codet: Code generation with generated tests B Chen, F Zhang, A Nguyen, D Zan, Z Lin, JG Lou, W Chen
arXiv preprint arXiv:2207.10397, 2022
313 2022 Understanding the difficulty of training transformers L Liu, X Liu, J Gao, W Chen, J Han
arXiv preprint arXiv:2004.08249, 2020
304 2020 On the advance of making language models better reasoners Y Li, Z Lin, S Zhang, Q Fu, B Chen, JG Lou, W Chen
arXiv preprint arXiv:2206.02336, 2022
300 * 2022 Short text conceptualization using a probabilistic knowledgebase Y Song, H Wang, Z Wang, H Li, W Chen
Proceedings of the twenty-second international joint conference on …, 2011
295 2011 Diffusion-gan: Training gans with diffusion Z Wang, H Zheng, P He, W Chen, M Zhou
arXiv preprint arXiv:2206.02262, 2022
250 2022 Generation-augmented retrieval for open-domain question answering Y Mao, P He, X Liu, Y Shen, J Gao, J Han, W Chen
arXiv preprint arXiv:2009.08553, 2020
250 2020 Tuning large neural networks via zero-shot hyperparameter transfer G Yang, E Hu, I Babuschkin, S Sidor, X Liu, D Farhi, N Ryder, J Pachocki, ...
Advances in Neural Information Processing Systems 34, 17084-17097, 2021
242 * 2021 Critic: Large language models can self-correct with tool-interactive critiquing Z Gou, Z Shao, Y Gong, Y Shen, Y Yang, N Duan, W Chen
arXiv preprint arXiv:2305.11738, 2023
236 2023