Follow
Nicholas Schiefer
Nicholas Schiefer
Anthropic
Verified email at mit.edu
Title
Cited by
Cited by
Year
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
5772022
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
2202022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
2102022
Toy models of superposition
N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ...
arXiv preprint arXiv:2209.10652, 2022
1392022
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ...
arXiv preprint arXiv:2212.09251, 2022
1232022
The capacity for moral self-correction in large language models
D Ganguli, A Askell, N Schiefer, TI Liao, K Lukošiūtė, A Chen, A Goldie, ...
arXiv preprint arXiv:2302.07459, 2023
922023
Towards measuring the representation of subjective global opinions in language models
E Durmus, K Nyugen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ...
arXiv preprint arXiv:2306.16388, 2023
582023
Towards monosemanticity: Decomposing language models with dictionary learning
T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, N Turner, ...
Transformer Circuits Thread, 2, 2023
492023
Measuring progress on scalable oversight for large language models
SR Bowman, J Hyun, E Perez, E Chen, C Pettit, S Heiner, K Lukošiūtė, ...
arXiv preprint arXiv:2211.03540, 2022
412022
Towards understanding sycophancy in language models
M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ...
arXiv preprint arXiv:2310.13548, 2023
352023
Measuring faithfulness in chain-of-thought reasoning
T Lanham, A Chen, A Radhakrishnan, B Steiner, C Denison, ...
arXiv preprint arXiv:2307.13702, 2023
322023
Universal Computation and Optimal Construction in the Chemical Reaction Network-Controlled Tile Assembly Model
N Schiefer, E Winfree
21st International Conference on DNA Computing and Molecular Programming …, 2015
262015
Question decomposition improves the faithfulness of model-generated reasoning
A Radhakrishnan, K Nguyen, A Chen, C Chen, C Denison, D Hernandez, ...
arXiv preprint arXiv:2307.11768, 2023
252023
FoundationDB Record Layer: A Multi-Tenant Structured Datastore
C Chrysafis, B Collins, S Dugas, J Dunkelberger, M Ehsan, S Gray, ...
Proceedings of the 2019 International Conference on Management of Data, 1787 …, 2019
222019
Exponentially improving the complexity of simulating the Weisfeiler-Lehman test with graph neural networks
A Aamand, J Chen, P Indyk, S Narayanan, R Rubinfeld, N Schiefer, ...
Advances in Neural Information Processing Systems 35, 27333-27346, 2022
142022
Superposition, memorization, and double descent
T Henighan, S Carter, T Hume, N Elhage, R Lasenby, S Fort, N Schiefer, ...
Transformer Circuits Thread, 2023
132023
Time Complexity of Computation and Construction in the Chemical Reaction Network-Controlled Tile Assembly Model
N Schiefer, E Winfree
22nd International Conference on DNA Computing and Molecular Programming …, 2016
92016
A fill estimation algorithm for sparse matrices and tensors in blocked formats
P Ahrens, H Xu, N Schiefer
2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2018
82018
Specific versus general principles for constitutional ai
S Kundu, Y Bai, S Kadavath, A Askell, A Callahan, A Chen, A Goldie, ...
arXiv preprint arXiv:2310.13798, 2023
72023
Sleeper agents: Training deceptive llms that persist through safety training
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
arXiv preprint arXiv:2401.05566, 2024
62024
The system can't perform the operation now. Try again later.
Articles 1–20