Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2192 | 2023 |
Deep learning for audio signal processing H Purwins, B Li, T Virtanen, J Schlüter, SY Chang, T Sainath IEEE Journal of Selected Topics in Signal Processing 13 (2), 206-219, 2019 | 895 | 2019 |
Streaming end-to-end speech recognition for mobile devices Y He, TN Sainath, R Prabhavalkar, I McGraw, R Alvarez, D Zhao, ... ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 748 | 2019 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 689 | 2024 |
A streaming on-device end-to-end model surpassing server-side conventional model quality and latency TN Sainath, Y He, B Li, A Narayanan, R Pang, A Bruguier, S Chang, W Li, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 229 | 2020 |
Towards fast and accurate streaming end-to-end ASR B Li, S Chang, TN Sainath, R Pang, Y He, T Strohman, Y Wu ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 137 | 2020 |
A better and faster end-to-end model for streaming asr B Li, A Gulati, J Yu, TN Sainath, CC Chiu, A Narayanan, SY Chang, ... ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 132 | 2021 |
Robust CNN-based speech recognition with Gabor filter kernels. SY Chang, N Morgan Interspeech, 905-909, 2014 | 101 | 2014 |
Fastemit: Low-latency streaming asr with sequence-level emission regularization J Yu, CC Chiu, B Li, S Chang, TN Sainath, Y He, A Narayanan, W Han, ... ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 96 | 2021 |
Personal VAD: Speaker-conditioned voice activity detection S Ding, Q Wang, S Chang, L Wan, IL Moreno arXiv preprint arXiv:1908.04284, 2019 | 93 | 2019 |
Temporal modeling using dilated convolution and gating for voice-activity-detection SY Chang, B Li, G Simko, TN Sainath, A Tripathi, A van den Oord, ... 2018 IEEE international conference on acoustics, speech and signal …, 2018 | 87 | 2018 |
Improved End-of-Query Detection for Streaming Speech Recognition. M Shannon, G Simko, SY Chang, C Parada Interspeech, 1909-1913, 2017 | 52 | 2017 |
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling. TN Sainath, Y He, A Narayanan, R Botros, R Pang, D Rybach, C Allauzen, ... Interspeech 8, 1777-1781, 2021 | 47 | 2021 |
Joint endpointing and decoding with end-to-end models SY Chang, R Prabhavalkar, Y He, TN Sainath, G Simko ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 44 | 2019 |
Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition. SY Chang, B Li, TN Sainath, G Simko, C Parada Interspeech, 3812-3816, 2017 | 37 | 2017 |
Streaming end-to-end multilingual speech recognition with joint language identification C Zhang, B Li, T Sainath, T Strohman, S Mavandadi, S Chang, P Haghani arXiv preprint arXiv:2209.06058, 2022 | 28 | 2022 |
Improving the latency and quality of cascaded encoders TN Sainath, Y He, A Narayanan, R Botros, W Wang, D Qiu, CC Chiu, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 26 | 2022 |
Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction SY Chang, BT Meyer, N Morgan 2013 IEEE international conference on acoustics, speech and signal …, 2013 | 22 | 2013 |
Turn-taking prediction for natural conversational speech S Chang, B Li, TN Sainath, C Zhang, T Strohman, Q Liang, Y He arXiv preprint arXiv:2208.13321, 2022 | 21 | 2022 |
E2e segmenter: Joint segmenting and decoding for long-form asr WR Huang, S Chang, D Rybach, R Prabhavalkar, TN Sainath, C Allauzen, ... arXiv preprint arXiv:2204.10749, 2022 | 20 | 2022 |