Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent X Lian, C Zhang, H Zhang, CJ Hsieh, W Zhang, J Liu
Advances in neural information processing systems 30, 2017
789 2017 Asynchronous parallel stochastic gradient for nonconvex optimization X Lian, Y Huang, Y Li, J Liu
Advances in Neural Information Processing Systems, 2737-2745, 2015
455 2015 Asynchronous decentralized parallel stochastic gradient descent X Lian, W Zhang, C Zhang, J Liu
International Conference on Machine Learning, 3043-3052, 2018
358 2018 Staleness-aware Async-SGD for Distributed Deep Learning W Zhang, S Gupta, X Lian, J Liu
International Joint Conference on Artificial Intelligence, 2016
266 2016 : Decentralized Training over Decentralized DataH Tang, X Lian, M Yan, C Zhang, J Liu
International Conference on Machine Learning, 4848-4856, 2018
264 2018 Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression H Tang, C Yu, X Lian, T Zhang, J Liu
International Conference on Machine Learning, 6155-6165, 2019
175 2019 A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order X Lian, H Zhang, CJ Hsieh, Y Huang, J Liu
Advances in Neural Information Processing Systems, 2016
89 2016 Finite-sum Composition Optimization via Variance Reduced Gradient Descent X Lian, M Wang, J Liu
Artificial Intelligence and Statistics, 2017
76 2017 Asynchronous Parallel Greedy Coordinate Descent Y You*, X Lian*(equal contribution), J Liu, HF Yu, I Dhillon, J Demmel, ...
Advances in Neural Information Processing Systems, 2016
47 2016 Revisit batch normalization: New understanding and refinement via composition optimization X Lian, J Liu
The 22nd International Conference on Artificial Intelligence and Statistics …, 2019
35 2019 1-bit adam: Communication efficient large-scale training with adam’s convergence speed H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ...
International Conference on Machine Learning, 10118-10129, 2021
23 2021 Stochastic recursive momentum for policy gradient methods H Yuan, X Lian, J Liu, Y Zhou
arXiv preprint arXiv:2003.04302, 2020
20 2020 Efficient smooth non-convex stochastic compositional optimization via stochastic recursive gradient descent W Hu, CJ Li, X Lian, J Liu, H Yuan
Advances in Neural Information Processing Systems 32, 2019
15 2019 NMR evidence for field-induced ferromagnetism in (Li 0.8 Fe 0.2) OHFeSe superconductor YP Wu, D Zhao, XR Lian, XF Lu, NZ Wang, XG Luo, XH Chen, T Wu
Physical Review B 91 (12), 125107, 2015
12 2015 Bagua: scaling up distributed learning with system relaxations S Gan, X Lian, R Wang, J Chang, C Liu, H Shi, S Zhang, X Li, T Sun, ...
arXiv preprint arXiv:2107.01499, 2021
11 2021 Persia: a hybrid system scaling deep learning based recommenders up to 100 trillion parameters X Lian, B Yuan, X Zhu, Y Wang, Y He, H Wu, L Sun, H Lyu, C Liu, X Dong, ...
arXiv preprint arXiv:2111.05897, 2021
7 2021 Stochastic recursive variance reduction for efficient smooth non-convex compositional optimization H Yuan, X Lian, J Liu
arXiv preprint arXiv:1912.13515, 2019
5 2019 Revisit batch normalization: New understanding from an optimization view and a refinement via composition optimization X Lian, J Liu
arXiv preprint arXiv:1810.06177, 2018
5 2018 Staleness-aware Async-SGD for Distributed Deep Learning. CoRR abs/1511.05950 (2015) W Zhang, S Gupta, X Lian, J Liu
arXiv preprint arXiv:1511.05950, 2015
5 2015 Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters X Lian, B Yuan, X Zhu, Y Wang, Y He, H Wu, L Sun, H Lyu, C Liu, X Dong, ...
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022
3 2022