Natural tts synthesis by conditioning wavenet on mel spectrogram predictions J Shen, R Pang, RJ Weiss, M Schuster, N Jaitly, Z Yang, Z Chen, Y Zhang, ... 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 842 | 2018 |
Specaugment: A simple data augmentation method for automatic speech recognition DS Park, W Chan, Y Zhang, CC Chiu, B Zoph, ED Cubuk, QV Le arXiv preprint arXiv:1904.08779, 2019 | 649 | 2019 |
An introduction to computational networks and the computational network toolkit MS Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Zhiheng Huang, Brian ... Tech. Rep. MSR, Microsoft Research, 2014, http://codebox/cntk, 2014 | 424* | 2014 |
Very deep convolutional networks for end-to-end speech recognition Y Zhang, W Chan, N Jaitly 2017 IEEE International Conference on Acoustics, Speech and Signal …, 2017 | 361 | 2017 |
Spoken language understanding using long short-term memory neural networks K Yao, B Peng, Y Zhang, D Yu, G Zweig, Y Shi IEEE SLT, 2014 | 270 | 2014 |
Highway long short-term memory rnns for distant speech recognition Y Zhang, G Chen, D Yu, K Yaco, S Khudanpur, J Glass 2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016 | 268 | 2016 |
Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis Y Wang, D Stanton, Y Zhang, RJS Ryan, E Battenberg, J Shor, Y Xiao, ... International Conference on Machine Learning, 5180-5189, 2018 | 234 | 2018 |
Transfer learning from speaker verification to multispeaker text-to-speech synthesis Y Jia, Y Zhang, RJ Weiss, Q Wang, J Shen, F Ren, Z Chen, P Nguyen, ... arXiv preprint arXiv:1806.04558, 2018 | 231 | 2018 |
Unsupervised learning of disentangled and interpretable representations from sequential data WN Hsu, Y Zhang, J Glass arXiv preprint arXiv:1709.07902, 2017 | 207 | 2017 |
Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM T Hori, S Watanabe, Y Zhang, W Chan arXiv preprint arXiv:1706.02737, 2017 | 188 | 2017 |
Training rnns as fast as cnns T Lei, Y Zhang, Y Artzi | 152 | 2018 |
Deep beamforming networks for multi-channel speech recognition X Xiao, S Watanabe, H Erdogan, L Lu, J Hershey, ML Seltzer, G Chen, ... 2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016 | 128 | 2016 |
Simple recurrent units for highly parallelizable recurrence T Lei, Y Zhang, SI Wang, H Dai, Y Artzi arXiv preprint arXiv:1709.02755, 2017 | 120 | 2017 |
LibriTTS: A corpus derived from LibriSpeech for text-to-speech H Zen, V Dang, R Clark, Y Zhang, RJ Weiss, Y Jia, Z Chen, Y Wu arXiv preprint arXiv:1904.02882, 2019 | 102 | 2019 |
I-Vector Based Clustering Training Data in Speech Recognition Q Huo, ZJ Yan, Y Zhang, J Xu US Patent App. 13/640,804, 2015 | 102 | 2015 |
Learning latent representations for speech generation and transformation WN Hsu, Y Zhang, J Glass arXiv preprint arXiv:1704.04222, 2017 | 99 | 2017 |
Hierarchical generative modeling for controllable speech synthesis WN Hsu, Y Zhang, RJ Weiss, H Zen, Y Wu, Y Wang, Y Cao, Y Jia, Z Chen, ... arXiv preprint arXiv:1810.07217, 2018 | 81 | 2018 |
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation WN Hsu, Y Zhang, J Glass 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 16-23, 2017 | 78 | 2017 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 73 | 2019 |
Extracting deep neural network bottleneck features using low-rank matrix factorization Y Zhang, E Chuangsuwanich, J Glass 2014 IEEE international conference on acoustics, speech and signal …, 2014 | 67 | 2014 |