Zhengyuan Yang

Получаване на мой собствен потребителски профил

Позовавания

	Всички	От 2019
Позовавания	4844	4830
h-индекс	29	29
i10-индекс	36	36

2000

1000

500

1500

20192020202120222023202461 130 303 629 1710 1991

Публичен достъп

Преглед на всички

16 статии

0 статии

налични

неналични

Въз основа на изисквания при финансирането

Съавтори

Lijuan WangMicrosoft GenAIПотвърден имейл адрес: microsoft.com
Jianfeng WangMicrosoftПотвърден имейл адрес: microsoft.com
Zicheng LiuMicrosoftПотвърден имейл адрес: microsoft.com
Linjie (Lindsey) LiSenior Researcher, MicrosoftПотвърден имейл адрес: microsoft.com
Jiebo LuoAlbert Arendt Hopeman Professor of Engineering, University of RochesterПотвърден имейл адрес: cs.rochester.edu
Kevin LinMicrosoftПотвърден имейл адрес: microsoft.com
Zhe GanResearch Scientist, AppleПотвърден имейл адрес: apple.com
Ce LiuAI Research Scientist Director, Meta GenAI; IEEE FellowПотвърден имейл адрес: meta.com
Liwei WangAssistant Professor at The Chinese University of Hong KongПотвърден имейл адрес: cse.cuhk.edu.hk
Jinsong SuXiamen UniversityПотвърден имейл адрес: xmu.edu.cn
Jianwei YangPrincipal Researcher, Microsoft Research, RedmondПотвърден имейл адрес: microsoft.com
Jiajun Deng (邓家俊)University of Adelaide, Australian Institute for Machine LearningПотвърден имейл адрес: adelaide.edu.au
Yuncheng LiGoogleПотвърден имейл адрес: google.com
Chenglei SiStanford UniversityПотвърден имейл адрес: stanford.edu
Boqing GongResearch Scientist, GoogleПотвърден имейл адрес: google.com

Следене

Zhengyuan Yang

Researcher, Microsoft

Потвърден имейл адрес: microsoft.com - Начална страница

Computer Vision Multimedia Vision + Language Multimodal


Заглавие Сортиране по цитати Сортиране по година Сортиране по заглавие	Позовавания Позовавания	Година
Git: A generative image-to-text transformer for vision and language J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang Transactions on Machine Learning Research (TMLR), 2022	419	2022
A fast and accurate one-stage approach to visual grounding Z Yang, B Gong, L Wang, W Huang, D Yu, J Luo IEEE International Conference on Computer Vision (ICCV), 4683-4693, 2019	345	2019
An empirical study of gpt-3 for few-shot knowledge-based vqa Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu, L Wang Proceedings of the AAAI conference on artificial intelligence 36 (3), 3081-3089, 2022	335	2022
The dawn of lmms: Preliminary explorations with gpt-4v (ision) Z Yang, L Li, K Lin, J Wang, CC Lin, Z Liu, L Wang arXiv preprint arXiv:2309.17421 9 (1), 1, 2023	332	2023
TransVG: End-to-End Visual Grounding with Transformers J Deng, Z Yang, T Chen, W Zhou, H Li IEEE International Conference on Computer Vision (ICCV), 2021	281	2021
Scaling up vision-language pre-training for image captioning X Hu, Z Gan, J Wang, Z Yang, Z Liu, Y Lu, L Wang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	242	2022
Mm-react: Prompting chatgpt for multimodal reasoning and action Z Yang, L Li, J Wang, K Lin, E Azarnasab, F Ahmed, Z Liu, C Liu, M Zeng, ... arXiv preprint arXiv:2303.11381, 2023	240	2023
Improving One-stage Visual Grounding by Recursive Sub-query Construction Z Yang, T Chen, L Wang, J Luo European Conference on Computer Vision (ECCV), 2020	211	2020
Mm-vet: Evaluating large multimodal models for integrated capabilities W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu, X Wang, L Wang The 41st International Conference on Machine Learning (ICML), 2024	208	2024
Prompting gpt-3 to be reliable C Si, Z Gan, Z Yang, S Wang, J Wang, J Boyd-Graber, L Wang International Conference on Learning Representations (ICLR 23), 2022	186	2022
End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions Z Yang, Y Zhang, J Yu, J Cai, J Luo 2018 24th international conference on pattern recognition (ICPR), 2289-2294, 2018	186	2018
Action recognition with spatio–temporal visual attention on skeleton image sequences Z Yang, Y Li, J Yang, J Luo IEEE Transactions on Circuits and Systems for Video Technology 29 (8), 2405-2415, 2018	184	2018
Attentive relational networks for mapping images to scene graphs M Qi, W Li, Z Yang, Y Wang, J Luo IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3957-3966, 2019	170	2019
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption Z Yang, Y Lu, J Wang, X Yin, D Florencio, L Wang, C Zhang, L Zhang, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021	155	2021
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou, J Luo Annual Meeting of the Association for Computational Linguistics (ACL), 2020	140	2020
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Z Yang, Z Gan, J Wang, X Hu, F Ahmed, Z Liu, Y Lu, L Wang European Conference on Computer Vision (ECCV), 521--539, 2022	130*	2022
Multimodal foundation models: From specialists to general-purpose assistants C Li, Z Gan, Z Yang, J Yang, L Li, L Wang, J Gao Foundations and Trends® in Computer Graphics and Vision 16 (1-2), 1-214, 2024	106	2024
Promptcap: Prompt-guided image captioning for vqa with gpt-3 Y Hu, H Hua, Z Yang, W Shi, NA Smith, J Luo Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	90*	2023
SAT: 2D Semantics Assisted Training for 3D Visual Grounding Z Yang, S Zhang, L Wang, J Luo IEEE International Conference on Computer Vision (ICCV), 2021	89	2021
ReCo: Region-Controlled Text-to-Image Generation Z Yang, J Wang, Z Gan, L Li, K Lin, C Wu, N Duan, Z Liu, C Liu, M Zeng, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023	82	2023

Системата не може да изпълни операцията сега. Опитайте отново по-късно.

Статии 1–20

Позовавания годишно

Дублирани описания

Обединени библиографски описания

Добавяне на съавториСъавтори

Следене

Позовавания

Съавтори