Следене
Paul Röttger
Paul Röttger
Postdoctoral Researcher, Bocconi University
Потвърден имейл адрес: unibocconi.it - Начална страница
Заглавие
Позовавания
Позовавания
Година
HateCheck: Functional Tests for Hate Speech Detection Models
P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, J Pierrehumbert
ACL 2021 (Main), 2021
1842021
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
P Röttger, B Vidgen, D Hovy, JB Pierrehumbert
NAACL 2022 (Main), 2022
862022
SemEval-2023 Task 10: Explainable Detection of Online Sexism
HR Kirk, W Yin, B Vidgen, P Röttger
ACL 2023 (Main), 2023
702023
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media
P Röttger, JB Pierrehumbert
EMNLP 2021 (Findings), 2021
492021
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale
NAACL 2022 (Main), 2021
402021
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
HR Kirk, B Vidgen, P Röttger, SA Hale
arXiv preprint arXiv:2303.05453, 2023
382023
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen
NAACL 2022 (WOAH), 2022
272022
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy
arXiv preprint arXiv:2308.01263, 2023
212023
Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions
F Bianchi, M Suzgun, G Attanasio, P Röttger, D Jurafsky, T Hashimoto, ...
ICLR 2024, 2023
182023
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
P Röttger, D Nozza, F Bianchi, D Hovy
EMNLP 2022 (Main), 2022
62022
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale
EMNLP 2023 (Main), 2023
52023
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics
M Orlikowski, P Röttger, P Cimiano, D Hovy
ACL 2023 (Main), 2023
32023
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
X Wang, B Ma, C Hu, L Weber-Genzel, P Röttger, F Kreuter, D Hovy, ...
arXiv preprint arXiv:2402.14499, 2024
12024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger
arXiv preprint arXiv:2311.08370, 2023
12023
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore
J Haber, B Vidgen, M Chapman, V Agarwal, RKW Lee, YK Yap, P Röttger
ACL 2023 (Main), 2023
12023
Tracking abuse on Twitter against football players in the 2021–22 Premier League Season
B Vidgen, YL Chung, P Johansson, HR Kirk, A Williams, SA Hale, ...
Available at SSRN 4403913, 2022
12022
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
C Holtermann, P Röttger, T Dill, A Lauscher
arXiv preprint arXiv:2403.03814, 2024
2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy
arXiv preprint arXiv:2402.16786, 2024
2024
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models
HR Kirk, B Vidgen, P Röttger, SA Hale
NeurIPS 2023 (SoLaR Workshop), 2023
2023
Системата не може да изпълни операцията сега. Опитайте отново по-късно.
Статии 1–19