Sci-Hub, the web service with over 60 million academic papers, released the DOIs of its article holdings earlier this year:
Here I want do some simple analysis with this dataset, and look at how many papers published by AGU are part of this collection. Keep in mind a few things:
- I believe in acquiring papers through legal means, and do not advocate searching for/using illegally distributed copies (through Sci-Hub or ResearchGate). New great tools to look for free versions of manuscripts are Unpaywall and oadoi.
- Papers in AGU journals from 1997 – 24 months ago are freely available, so many of the papers on Sci-Hub are already free.
OK, back to the Sci-Hub dataset: article Digital Object Identifiers (DOIs), are broken into 2 parts: the prefix and a suffix. As I understand it, prior to being published by Wiley, AGU articles used the prefix (10.1029), so I first extracted all entries with this older prefix. This really restricts this analysis to pre-2013 AGU articles (I’m not sure when exactly the change occurred), but some of the older articles might be published prior to the 1997 ‘open access’ cutoff.
I was left with a list of 171,752 articles (from the original 62 million).
The suffix of AGU DOIs corresponds to a single article from a specific journal, and letter codes in the suffix are used to denote the journal — for example, a GRL article has a suffix that includes ‘GL’ or ‘gl’.
For example, a GRL article has a DOI that looks like this: 10.1029/2006GL028162
Each AGU journal has a unique letter combo in the suffix, so the list of DOIs can be counted based on this suffix. Here is the line of julia code that I used to search through the list of ‘10.1029’ DOIs to find GRL articles.
Parsing the 171,752 articles into specific journals yields:
- For these AGU journals, GRL has the most articles on Sci-Hub, note also the huge volume of EOS articles (?!?), and Water Resources Research.
- The older and higher volume JGR sections (Oceans, Atmopsheres, Solid Earth and Space Physics) outweigh the newer, smaller sections (Biogeoscience, Earth Surface, Planets). Here are some JGR publication stats.
- ~5% of the 171,752 AGU DOIs did not conform to this search — they might be books, chapters, or other documents.
Here is another interesting article on Sci-Hub, from Science (Bohannon, 2016): Who’s Downloading Pirated Papers? Everyone