Large-Scale Study Successfully Replicates About Half of a Sample of Social and Behavioral Science Findings
Prof. Dr. Heinrich Nax co-authors a new study published in Nature, of which implications point to uncertainty—not failure—and a need for more cautious confidence in research claims.
A new study published in Nature, Investigating the replicability of the social and behavioral sciences, reports results from a large-scale effort testing whether published findings can be replicated successfully when tested again with new data. Overall, about half of claims replicated successfully, with replication outcomes showing smaller effect sizes than original outcomes, on average.
The results come from the most systematic, cross-disciplinary replication effort of its kind to date, conducted as part of the Systematizing Confidence in Open Research and Evidence (SCORE) program funded by the Defense Advanced Research Projects Agency (DARPA). A collaboration of 292 researchers from 199 institutions from around the world attempted to replicate findings from 164 published papers drawn from well-known journals across several fields including business, economics, education, political science, psychology, and sociology and published between 2009 and 2018. Replications were designed to be rigorous, well-powered, preregistered, transparent, and went through a peer review process in advance to improve research quality.
Investigating the replicability of the social and behavioural sciences
Brian Nosek (Center for Open Science) et al.
First published online April 1st, 2026 in Nature
doi.org/10.1038/s41586-025-10078-y
Abstract
Pursuing replicability — independent evidence for previous claims — is important for creating generalizable knowledge. Here we attempted replications of 274 claims of positive results from 164 quantitative papers published from 2009 to 2018 in 54 journals in the social and behavioural sciences. Replications were high powered on average to detect the original effect size (median of 99.6%), used original materials when relevant and available, and were peer reviewed in advance through a standardized internal protocol. Replications showed statistically significant results in the original pattern for 151 of 274 claims (55.1% (95% CI 49.2–60.9%)) and for 80.8 of 164 papers (49.3% (95% CI 43.8–54.7%)) weighed for replicating multiple claims per paper. We observed modest variation in replication rates across disciplines (42.5–63.1%), although some estimates had high uncertainty. The median Pearson’s r effect size was 0.25 (95% CI 0.21–0.27) for original studies and 0.10 (95% CI 0.09–0.13) for replication studies, a 82.4% (95% CI 67.8–88.2%) reduction in shared variance. Thirteen methods for evaluating replication success provided estimates ranging from 28.6% to 74.8% (median of 49.3%). Some decline in effect size and significance is expected based on power to detect original effects and regression to the mean due to replicating only positive results. We observe that challenges for replicability extend across social– behavioural sciences, illustrating the importance of identifying conditions that promote or inhibit replicability.
Instagram LinkedIn Bluesky Mail