Hannah Rose Kirk

Получаване на мой собствен потребителски профил

Позовавания

	Всички	От 2019
Позовавания	682	682
h-индекс	12	12
i10-индекс	14	14

420

210

105

315

20212022202320248 67 419 180

Публичен достъп

Преглед на всички

2 статии

0 статии

налични

неналични

Въз основа на изисквания при финансирането

Съавтори

Bertie VidgenOxford, TuringПотвърден имейл адрес: rewire.online
Paul RöttgerPostdoctoral Researcher, Bocconi UniversityПотвърден имейл адрес: unibocconi.it
Aleksandar (Suny) ShtedritskiPhD student, University of OxfordПотвърден имейл адрес: robots.ox.ac.uk
Scott A. HaleOxford Internet Institute, University of Oxford, Meedan, and the Alan Turing InstituteПотвърден имейл адрес: oii.ox.ac.uk
Yuki M. AsanoAssistant Professor, University of AmsterdamПотвърден имейл адрес: uva.nl
Yennie JunGoogle Research, Truveta, University of Oxford, UN Global PulseПотвърден имейл адрес: google.com
Frédéric A. DreyerUniversity of OxfordПотвърден имейл адрес: physics.ox.ac.uk
Siobhan Mackenzie HallDPhil Student, University of OxfordПотвърден имейл адрес: nds.ox.ac.uk
Leon DerczynskiITU Copenhagen & NVIDIAПотвърден имейл адрес: itu.dk
Max BainUniversity of OxfordПотвърден имейл адрес: robots.ox.ac.uk
Jonas SchuettResearch Fellow, Centre for the Governance of AI, Oxford, UKПотвърден имейл адрес: governance.ai
Luciano FloridiYale University - Alma Mater Studiorum University of BolognaПотвърден имейл адрес: yale.edu
Jakob MökanderUniversity of OxfordПотвърден имейл адрес: oii.ox.ac.uk
Tristan ThrushStanfordПотвърден имейл адрес: stanford.edu
Wenjie YinQueen Mary University of LondonПотвърден имейл адрес: qmul.ac.uk
abeba birhaneAdjunct assistant professor at the school of computer science and statistics, Trinity College DublinПотвърден имейл адрес: tcd.ie
Yash BhalgatVisual Geometry Group, University of OxfordПотвърден имейл адрес: robots.ox.ac.uk
Hugo BergUndergraduate student, Mathematics & Computer Science, University of OxfordПотвърден имейл адрес: ccc.ox.ac.uk
Dirk HovyBocconi UniversityПотвърден имейл адрес: unibocconi.it
Noah BroestlGoogle Research and Oxford Uehiro Centre for Practical EthicsПотвърден имейл адрес: google.com

Следене

Hannah Rose Kirk

University of Oxford

Потвърден имейл адрес: oii.ox.ac.uk - Начална страница

Large language models NLP Ethics in AI Alignment AI Safety


Заглавие Сортиране по цитати Сортиране по година Сортиране по заглавие	Позовавания Позовавания	Година
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models HR Kirk, Y Jun, F Volpin, H Iqbal, E Benussi, F Dreyer, A Shtedritski, ... Advances in neural information processing systems 34, 2611-2624, 2021	124	2021
Auditing large language models: a three-layered approach J Mökander, J Schuett, HR Kirk, L Floridi AI and Ethics, 1-31, 2023	105	2023
Dataperf: Benchmarks for data-centric ai development M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ... Advances in Neural Information Processing Systems 36, 2024	70	2024
SemEval-2023 task 10: explainable detection of online sexism HR Kirk, W Yin, B Vidgen, P Röttger arXiv preprint arXiv:2303.04222, 2023	70	2023
A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning H Berg, SM Hall, Y Bhalgat, W Yang, HR Kirk, A Shtedritski, M Bain Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the …, 2022	62	2022
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale Proceedings of the 2022 Conference of the North American Chapter of the …, 2021	43	2021
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback HR Kirk, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2303.05453, 2023	41	2023
Handling and Presenting Harmful Text in NLP HR Kirk, A Birhane, B Vidgen, L Derczynski EMNLP Findings, 2022	28*	2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements C Borchers, DS Gala, B Gilburt, E Oravkin, W Bounsi, YM Asano, HR Kirk Proceedings of the 4th workshop on gender bias in natural language …, 2022	24	2022
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset HR Kirk, Y Jun, P Rauba, G Wachtel, R Li, X Bai, N Broestl, M Doff-Sotta, ... Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021	23	2021
Xstest: A test suite for identifying exaggerated safety behaviours in large language models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy arXiv preprint arXiv:2308.01263, 2023	21	2023
Assessing language model deployment with risk cards L Derczynski, HR Kirk, V Balachandran, S Kumar, Y Tsvetkov, MR Leiser, ... arXiv preprint arXiv:2303.18190, 2023	15	2023
Casteist but not racist? quantifying disparities in large language model bias between india and the west K Khandelwal, M Tonneau, AM Bean, HR Kirk, SA Hale arXiv preprint arXiv:2309.08573, 2023	10	2023
The nuances of Confucianism in technology policy: An inquiry into the interaction between cultural and political systems in Chinese digital ethics HR Kirk, K Lee, C Micallef International Journal of Politics, Culture, and Society, 1-24, 2020	10	2020
Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets B Smith, M Farinha, SM Hall, HR Kirk, A Shtedritski, M Bain arXiv preprint arXiv:2305.15407, 2023	8	2023
The past, present and better future of feedback learning in large language models for subjective human preferences and values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2310.07629, 2023	6	2023
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning HR Kirk, B Vidgen, SA Hale Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying …, 2022	6	2022
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy arXiv preprint arXiv:2402.16786, 2024	3	2024
Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution SM Hall, F Gonçalves Abrantes, H Zhu, G Sodunke, A Shtedritski, HR Kirk Advances in Neural Information Processing Systems 36, 2024	3	2024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv preprint arXiv:2311.08370, 2023	3	2023

Системата не може да изпълни операцията сега. Опитайте отново по-късно.

Статии 1–20

Позовавания годишно

Дублирани описания

Обединени библиографски описания

Добавяне на съавториСъавтори

Следене

Позовавания

Съавтори