November 29, 2017

Transparency and Open Science


This fall I had an extended conversation with Tim Parker from Whitman College, who is Co-organizer of the “Tools for Transparency in Ecology and Evolution” group. Here’s a brief summary of the TTEE goals:

“In November 2015, representatives (mostly editors-in-chief) from more than 20 journals in ecology and evolution joined researchers and funding agency panelists to identify ways to improve transparency in these disciplines. This workshop (funded by the US National Science Foundation and by the Laura and John Arnold Foundation and hosted by the Center for Open Science) identified general principles and specific tools that journals can adopt to encourage greater transparency of the science they publish. Most of the ideas that emerged from the workshop rest within the recently developed Transparency and Openness Promotion (TOP) framework (https://cos.io/top/). At this time, TOP contains eight separate editorial guidelines for journals, each designed to be adoptable by any empirical discipline”

Tim and I discussed the eight TOP guidelines, whose details can be found here . Here’s a summary of where we stand on this and where I would like to go in the coming year. I would welcome thoughts on this process from you.

1.     Citation Standards: AmNat is already at the highest “Level 3”, which stipulates that “ Article is not published until providing appropriate citation for data and materials” (e.g., a doi address for the data on Dryad or elsewhere), with rare exceptions granted for cases where authors get waivers regarding posting data (e.g., for ongoing analyses of long-term datasets, or for human data where privacy requirements prohibit data sharing).

2.     Data Transparency: AmNat is at the basic “Level 1” (“Article states whether data are available and, if so, where to access them.”). The higher Level 2 requires that all data are available without exception, which I do not think is feasible (e.g., human genetic studies often prohibit data sharing for privacy concerns as part of human subjects regulations). We are 95% of the way there, but that last 5% I don’t think is likely to happen.

3.     Code Transparency: We will implement the basic “Level 1” (the article states whether code used in analysis or simulation is publicly available, and if so where). If the community supports the move, I would be okay with shifting toward Level 2 in the near future (“Code must be posted to a trusted repository. Exceptions must be identified at article submission.”). I find it odd, by the way, that Code transparency allows exceptions while data transparency does not. My own unfortunate experience (which I wrote about here as a means of catharsis) is that code can contain mistakes. Transitioning to code repository will help encourage authors to proofread their code as carefully as they proofread their prose, thereby (hopefully) catching any minor code mistakes that can have major effects on conclusions.

4.     Materials Transparency: the reagents and materials that we use are so heterogeneous as a set of disciplines, that I did not see particular merit to requiring that the “Article states whether materials are available and, if so, where to access them.” So we will not presently aim for this standard.

5.     Design and Analysis Transparency: We will soon institute Level 1 (“Journal articulates design transparency standards.”). This will entail a checklist (e.g., confirming that the paper clearly reports sample sizes, effect sizes, which statistical tests, etc) that authors fill out upon resubmission of a revised manuscript. Higher levels require these checklists before we even review a paper the first time, which I think is a disincentive for submissions. Having this happen upon resubmission is a good way to help guide authors through careful revision. The checklist is currently a work in progress, building on a TTEE template.

6.     Study Plan Pre-registration: Pre-registration is the practice of publicly reporting one’s experimental design in advance of executing the project and publishing. This allows readers to be confident that the published work was indeed designed for the stated goal, rather than some post-hoc reinterpretation. I see the value in this goa, but I also strongly believe that biology often surprises us with un-looked for insights. Consequently, I have no interest in requiring pre-registration. Neither do I see a down-side to allowing authors to state whether and where they pre-registered. TOP’s “Level 1” requires that papers merely state whether or not the study design was pre-registered. If so, they state where the pre-registered plan can be accessed. There would be no penalty for studies that were not pre-registered.

7.     Analysis Plan Pre-registration: As with plan pre-registration, authors have the opportunity to record their statistical plans before obtaining data. This reduces the temptation to ‘P-hack’, doing analyses until a significant result is found. Again, I see value in allowing our data to surprise us and lead us to new questions, so I do not intend to require pre-registered analysis. But, I am fine with papers reporting whether or not they were pre-registered. So, I am planning on adopting TOP’s Level 1 standard: papers state whether or not the analyses were pre-registered, and if so, where the pre-registration can be found. Post-hoc analyses are still allowed, and manuscripts without pre-registration would still have equal footing with those that did pre-register.

8.     Replication: TOP’s replication guideline states that the basic level is that the “Journal encourages submission of replication studies.” This would be a big break from The American Naturalist’s tradition, as our brand is based on conceptual novelty and so generally we do not accept studies that duplicate a prior result in a new species, let alone in the same species as a prior study. Consequently, at present I do not intend to adopt this guideline.

I am entirely open to feedback on whether and how to institute these TOP guidelines. In particular, if anyone wants to help me craft some of the details (e.g., pre-registration statement formats, checklists), just drop me a note.

An additional topic often comes up when we discuss open science, which is the topic of author page charges, publication fees, versus open access. The American Naturalist is a non-profit journal published by the University of Chicago Press, unlike many of the larger journals published by for-profit institutions. We have converged on a hybrid model of publication. Authors who have sufficient resources to pay for Open Access fees (that help defray the costs of production, see this blog post for an explanation), are welcome to do so. But many of our papers are written by graduate students or junior researchers who may not have substantial funds for publication charges. We seek to accommodate these authors as well, by allowing people to publish non-open access and pay very low page charges. People who truly don't have the financial means to pay even these small charges can get waivers. By taking this model, we remove barriers-to-entry for aspiring authors who maybe can't afford several-thousands-of-dollars publishing costs seen at some other journals. We also make it possible for authors with means to go Open Access to reach a broader audience.

- Dan Bolnick
Incoming Editor-In-Chief





2 comments:

  1. many psychology journals require data deposition despite routinely dealing with human data and privacy issues. There are solutions for that. So it really seems odd that an ecology and evolution journal would use that as an excuse to not be more committed to data sharing.

    ReplyDelete
    Replies
    1. That's a fair point, but the difference is that a human genome sequence can, with some analyses, be traced back to the individual who participated in a study, or at least to their family. That is, if any of your close relatives took a 23&me test or an Ancestry.com test (for example), then your genome sequence published with a study could be compared to databases to identify at least to family who was in a study. In the future that might impact individuals' insurability, etc. So I am told by human genetics folks (I am married to one) that some IRBs and some social communities place moratoriums on the kind of data sharing that we are used to. A researcher who shares data in the way we normally expect, might find a tribal group subsequently stops cooperating with them. So there are genuine barriers here, and some geneticists do find this to be a deal-breaker for these reasons. It is more than an excuse.

      Delete