Category Archives: Uncategorized

Five questions (and answers) about preprints

Five questions (and answers) about preprints

(This post originally appeared on the Coast and Ocean Collective Blog)

After the last post, Giovanni asked me to respond to five questions.

1) What about reviewing, are you saying that reviewing of scientific manuscripts is not necessary?

My previous post purposely avoids discussion of peer-review. If you look at the graphic of the earlier post taken from the Wouters et al. 2019 article, one role of a journal is to certify that published work has been evaluated. I think readers need to understand that preprints are not evaluated. On EarthArXiv, all manuscripts that are not peer-reviewed must have notice stating as much.

But thinking about peer-review and evaluation brings up some deeper questions. By assuming that an article is true or valid just because it is typeset in a journal is ridiculous idea — the burden of convincing a reader is on the author based on their work. My point is that perhaps we should remember that most of use read and evaluate papers for ourselves — regardless of where it is published.

Additionally, most journals hide peer review reports (the Copernicus/EGU journals being an exception) — most of the time, we are just assuming that the peer reviewing process was performed adequately for published papers. This is not the case for published peer review reports (e.g., the Copernicus/EGU journals) a reader can actually read and understand the peer reviewing process for a given paper, but hidden review reports force a reader to trust that the journal publisher, the editors, and the reviewers all did an adequate job.

Lastly, there are initiatives to decouple review from journals. For example, PREreview is an initiative focused on reviews for preprints. I think it’s a neat idea if journal clubs focus on preprints, and produce review reports that could be relayed to the author. Another example is PubPeer.

2) Are preprints just a stop-gap solutions or are they a general solution to problems in scholarly publishing?  

I think preprints are part of a larger solution to some problems in scholarly publishing. There are bigger issues though, many having to do with the costs to authors (page charges; OA fees) and the cost of journal subscriptions for libraries. A clear next step is to start to develop free open access journals, and potentially even journals that could potentially work on top of the preprint servers. here are some really inspiring examples:

Journals like VOLCANICA are breaking new ground for the Earth sciences by creating a free, open access journal with a v. low operating cost (500 euros per year) for the entire journal and transparency in the breakdown of those costs. Here is the editorial —

Another journals with low operating costs and transparency on how costs are used is JOSS — Journal of Open Source Software — — operating costs of ~$3.50/per article. Joss runs on Github, and here is the editorial describing the model and costs:

Lastly an example of a journal that explicitly leverages preprint infrastructure is Discrete Analysis — An author submits an article to a preprint server, in this case ArXiv, and tells the journal they would like to submit. The journal organizes reviews, and if it is accepted, the article is given a special LaTeX template to indicate it was reviewed, and the new version is resubmitted to ArXiv.

3) What about papers that get rejected? Especially if a paper is rejected from a short-form journal and needs to go to a long-form Journal?

Preprints have version control, so if the paper needs to change to respond to criticism, or the authors need to adjust formatting for a different journal, then the preprint is replaced with a new copy.

4) Can I cite preprints?

Yes, preprints can and should be cited. Preprints are given a DOI, and this can be used in the journal citation. EarthArXiv preprints are also indexed by Google Scholar. In the case that people have cited your preprint and you now have a published journal article, Dan Ibarra wrote a nice summary on twitter on how to merge these two entries in Google Scholar.

5) Are you sure about how good it is for early career researchers? Are there risks?

In my opinion, I think preprints are really great for early career researchers. I think there are two issues with preprints that might be perceived as risks for researchers — first, if a paper is rejected and needs to be reformatted for a new journal. Second, if there is an error in a paper and it needs to be corrected. If there are errors in a paper, the author can easily issue a correction in the form of a new version of the preprint. Error correction becomes easier and hopefully faster than dealing with a journal article. Reformatting is solved in the same way — a new version of the preprint can be produced. Perhaps having multiple versions of a preprint might make a researcher feel self conscious or embarrassed, but as a community we should destigmatize this process of versioning of preprints (and all scholarly artifacts, for that matter). Error identification and correction is an important part of scientific process, and journal choices are often not controlled by early career researchers. Regardless, it’s likely that only a rare few people will dig through old preprint versions to determine the changes that authors made.

My personal belief is that the rewards of visibility to written scientific work outweigh anything that can be called a risk.

My Time at CSDMS 2019


(This post originally appeared on the Coast and Ocean Collective Blog)

In May I went to my first annual meeting of CSDMS— the Community Surface Dynamics Modeling System. It was great to see old friends and meet new ones.

CSDMS is involved in a range of different projects and provides a suite of different services to the earth surface processes modeling community. You might know about CSDMS from its model repository (with metadata and links to source code) and the handy tools developed by CSDMS to link models together. For more background on CSDMS, check out their webpage.

One nice aspect of CSDMS is that the keynotes and panels are recorded and put on YouTube, and many poster presenters upload PDFs of their poster. I have spent a few hours skimming through these videos and PDFs from past meetings — lots of interesting ideas.

The annual meeting theme this year was ‘Bridging Boundaries’, and there was a range of interesting talks, posters, clinics, breakout sessions, and panels. I want to just mention a few highlight during those 3 packed days.

  • I really enjoyed the wide range of keynotes. Two particularly interesting ones were:
  • I really enjoyed the 2 panel discussions:
  • A real highlight for me was Dan Buscombe’s deep learning clinic. Dan walked us through a comprehensive Jupyter notebook based on his work on pixel-scale image classification. It was great to hear Dan explain his workflow, and it was great to meet him in person. I urge you to check out his work!
  • There were too many amazing posters to cover in one post. I recommend scrolling through the abstracts and poster pdfs online.
  • I live-tweeted the 3rd day through the CSDMS and AGU EPSP twitter accounts. This was really fun and I’m grateful for the opportunity from the AGU EPSP social media team.
  • I am very grateful to CSDMS for inviting me to give a keynote this year — it was exciting to share my ideas with such a talented group of people. My talk — video, slides — focused on ML work that I have done with the Coast and Ocean Collective (and others), specifically work on swash, runup, ‘hybrid’ models, and the ML review paper that was just published.
  • Lastly, I ate a lot of (good) pizza.



Signed and Unsigned Reviews in Earth Surface Dynamics

All of the reviews for Earth Surface Dynamics are open, published, and citable. Today I do a bit of webscraping to determine the % of mix of signed and blind reviews for the 198 paper reviewed in EsurfD. Also, since reviews occur in sequence (i.e.,  R1 submits their review before R2), we can exame how R1’s decision to sign a review influences the decision of R2.

The code to do the webscraping is here. Note that R is not my best language, but I am using it because of all the cool packages written for R to interface with Crossref (rcrossref, for obtaining publication DOIs), and the easy webscraping (rvest).

The code works by:

  1. Pulling (from Crossref) details for all ESurf Discussion publications using the ISSN number.
  2. Going to every EsurfD page (following the DOI link)
  3. Scraping the webpage for author, editor, and reviewer comment (see this helpful tutorial on using rvest).
  4. Checking for descriptive words, for instance “Anonymous Comment #1”, to determine if Reviewer 1 and/or Reviewer 2 were anonymous.
  5. Check to see if a Reviewer 3 exists (to exclude the data… I only want to deal with papers with 2 reviewers for this initial study).

I imagine some specific pathological cases in review comments may have slipped through this code, but a cursory check shows it captures relevant information. After the code runs, I am left with 135 papers with 2 reviewers, for a total of 270 reviews. In total, 41% reviews are signed — this matches previous reports such as 40% reported by Okal (2003) and the 40% reported by PeerJ

  • Reviewer 1 totals are 74 unsigned, 61 signed —55% unsigned, 45% signed
  • For the 74 papers where Review 1 is unsigned,
    • Reviewer 2 data is 59 unsigned, 15 signed — 80% unsigned, 20% signed
  • For the 61 papers where Review 1 is signed,
    • Reviewer 2 data is 27 unsigned,  34 signed — 44% unsigned, 54% signed.

There is one clear confounding factor here, which is how positive/negative reviews impact the likelyhood to sign a review (both for R1 and R2). I imagine referee suggestions to the editor (e.g., minor revisions, major revisions, reject) and/or text mining could provide some details. (I can think of a few other confounds beyond this one)…. Furthermore, I would assume that since many (all?) journals from Copernicus/EGU have open review, this analysis could be scaled…

Peer review model — results part 2

Backstory of the model:

This model is based on networks, so I’ll use some of the language and techniques from the study of networks to analyze the data.This peer review model creates a directed and weighted network. In other words, the ‘scientists’ (nodes) are connected (via edges) to other scientists (other nodes). The connections (edges) have a direction (how ‘scientist A’ feels toward ‘B’) and weight (-3, negatively). The book-keeping for this model is an adjacency matrix.

A=\begin{pmatrix}0&5&-1&0\\ 1&0&2&0\\ 0&-3&0&0\\ 0&0&1&0\end{pmatrix}

Where A_{ij} denotes the an edge from i to j with a given weight. In this model, it is the mood that scientist i has toward scientist j . (Some other texts do the reverse convention).

A measurement for this sort of matrix is incoming and outgoing node strength. The outgoing strength of scientists i — how scientist i feels about all other scientists — can be denoted as:

s_{i}^{out}=\sum_{j \neq i} A_{ij}

And can be calculated by summing rows. The incoming strength of scientists i — how all other scientists feel about scientist i — can be denoted as:

s_{i}^{in}=\sum_{j \neq i} A_{ji}

And can be calculated by summing columns. (for reference, my previous post showed time series plots of the mean of incoming weights, similar to the strength metric we are talking about here (s_{i}^{in})).

Signed reviewers can be polarizing — weights can quickly become very negative and/or very positive. So the strengths (s_{i}^{in} and s_{i}^{out}) will be a sum of extreme positives and negatives — this is not very descriptive because it can lead to 0 strength. Instead I want to look at the range of incoming and outgoing weights, or:

R_{i}^{out}= \max\limits_{j \neq i} A_{ij} - \min\limits_{j \neq i} A_{ij} which denotes the maximum outgoing weight minus the minimum outgoing weight.

R_{i}^{in}=\max\limits_{j \neq i} A_{ji} - \min\limits_{j \neq i} A_{ji} which denotes the maximum incoming weight minus the minimum incoming weight.

Now let’s now look at some model results, R_{i}^{out} and R_{i}^{in}, for each scientist.


  • Both types of reviewers have similar R_{i}^{out} — they tend to have a similar range in their opinions about the scientists in the discipline.
  • Signed reviewers tend to have a larger R_{i}^{in} —  the range of feelings that other scientists have toward the signed reviewers — compared to those who do not sign reviews. Scientists tend to either like or dislike signed reviewers more strongly that unsigned reviewers. 

An added feedback is coming….

Some inspiration for this work comes from:

The growing reference section of geomorphology articles

What is the mean of number of references in a geomorphology journal? Are reference sections growing (as the number of papers published increases)? I have previously looked at reference section size changes in JGR-ES and in length restricted articles (i.e., GRL and Geology), but I am extending the analysis to the four geomorphology journals I commonly read: Geomorphology, Earth Surface Processes and LandformsEarth Surface Dynamics, and  Journal of Geophysical Research – Earth Surface.

You can see below that the mean reference section size is growing at a rate of ~2 additional references per year: