AGU publications on Sci-Hub

Sci-Hub, the web service with over 60 million academic papers, released the DOIs of its article holdings earlier this year:

The data was quickly put into a figshare repository (Hahnel, 2017), and some analysis has already been done by Greshake (2017) on this list of DOIs.

Here I want do some simple analysis with this dataset, and look at how many papers published by AGU are part of this collection. Keep in mind a few things:

  1. I believe in acquiring papers through legal means, and do not advocate searching for/using illegally distributed copies (through Sci-Hub or ResearchGate). New great tools to look for free versions of manuscripts are Unpaywall and oadoi.
  2. Papers in AGU journals from 1997 – 24 months ago are freely available, so many of the papers on Sci-Hub are already free.

OK, back to the Sci-Hub dataset: article Digital Object Identifiers (DOIs), are broken into 2 parts: the prefix and a suffix. As I understand it, prior to being published by Wiley, AGU articles used the prefix (10.1029), so I first extracted all entries with this older prefix. This really restricts this analysis to pre-2013 AGU articles (I’m not sure when exactly the change occurred), but some of the older articles might be published prior to the 1997 ‘open access’ cutoff.

I was left with a list of 171,752 articles (from the original 62 million).

The suffix of AGU DOIs corresponds to a single article from a specific journal, and letter codes in the suffix are used to denote the journal — for example, a GRL article has a suffix that includes ‘GL’ or ‘gl’.

For example, a GRL article has a DOI that looks like this: 10.1029/2006GL028162

Each AGU journal has a unique letter combo in the suffix, so the list of DOIs can be counted based on this suffix. Here is the line of julia code that I used to search through the list of ‘10.1029’ DOIs to find GRL articles.

GRL=length(matchall(r"gl"i,s))

Parsing the 171,752 articles into specific journals yields:

Papers.jpg

  • For these AGU journals, GRL has the most articles on Sci-Hub, note also the huge volume of EOS articles (?!?), and Water Resources Research.
  • The older and higher volume JGR sections (Oceans, Atmopsheres, Solid Earth and Space Physics) outweigh the newer, smaller sections (Biogeoscience, Earth Surface, Planets). Here are some JGR publication stats.
  • ~5% of the 171,752 AGU DOIs did not conform to this search — they might be books, chapters, or other documents.

Here is another interesting article on Sci-Hub, from Science (Bohannon, 2016): Who’s Downloading Pirated Papers? Everyone

Debris Flow Experiments (Spring 2017)

This spring I taught an undergraduate Geomorphology class at Duke. For the last few weeks of class, I broke out my debris flow flume. I have written about this debris flow previously, and is described here on the Sediment Experimentalist Network site. Also posted is a slo-mo video of a typical debris flow.

Students planned and executed an experiment of their choosing — an example of a  ‘Course-based Undergraduate Research Experience’ (CURE). Though there has been some work done with ‘scaled down’ debris flows (e.g., de Haas et al. 2015) there seemed to be lots of room for the students to do something new.

Both groups ended up investigating various mitigation measures for slowing or stopping debris flows. This involved 3D printed several pieces as mitigation structures, from solid walls of various sizes and angles:


…to plates with various densities of upright rods/sticks to function as tree/vegetation mimics:


Each group ended up writing up their work as a paper (data and plots included), and I’m happy to share them here:

  • Paper 1 focused on solid walls
  • Paper 2 focused on the ‘green infrastructure’ mimics.

‘Sleeping Beauties’ of Geomorphology: cases from the Journal of Geology

To recap from a previous post:

“Most papers in disciplinary geomorphology journals are cited at some point, but citations to papers do not always accrue immediately upon publication — ideas and papers might take time to be used by researchers and therefore cited. Extreme examples of delayed recognition (‘Sleeping Beauties‘) — where papers receive no citations for long stretches of time only to receive a large, late burst in citations — have been identified and investigated previously.

Do geomorphology ‘Sleeping Beauties’ exist? Using the methods of Ke et al. (2015) to find and score ‘Sleeping Beauties’, it turns out that 9 out of the 20 most delayed papers in GSA Bulletin are focused on quantitative geomorphology. What other papers show this interesting signature of delayed recognition?”

Today I want to look for Sleeping Beauties in ‘The Journal of Geology‘. 

JG has been published since 1893, and has been the venue for some classic geomorphology papers (e.g., Wolman And Miller, 1960; Magnitude and Frequency of Forces in Geomorphic Processes; which i will discuss in a future post..)

In January 2017 I downloaded the citation time series for the 500 most cited journal of geology articles and used the algorithm of Ke et al. (2015) to find papers with the highest ‘delayed recognition’ score — a ranking of each paper’s citation time series based on the largest, latest peak (read Ke et al. (2015) to learn more about the method).

The top for papers, published from 1922 to 1935, are all focused on grain size and shape:

  1. Wentworth, C. K. (1922). A scale of grade and class terms for clastic sediments. The Journal of Geology, 30(5), 377-392. (pdf here)
  2. Wadell, H. (1935). Volume, shape, and roundness of quartz particles. The Journal of Geology, 43(3), 250-280. (article here)
  3. Wadell, H. (1932). Volume, shape, and roundness of rock particles. The Journal of Geology, 40(5), 443-451. (article here)
  4. Wadell, H. (1933). Sphericity and roundness of rock particles. The Journal of Geology, 41(3), 310-331. (article here)

The citation time series for each paper is shown below:JG.jpg

As with the last post, I will not offer any ‘reasons’ why these papers have an explosion in citations in the past 10 years. To do this, a first step would be a careful look at co-citation networks — what papers often co-occur with the citations — and the actual in-text usages and citations.

I did a cursory look at co-cited papers, and all of the papers show an affinity to two recent well-cited papers:

  • Blott, S. J., & Pye, K. (2001). GRADISTAT: a grain size distribution and statistics package for the analysis of unconsolidated sediments. Earth Surface Processes and Landforms, 26(11), 1237-1248. http://doi.org/10.1002/esp.261
  • Blott, S. J., & Pye, K. (2008). Particle shape: a review and new methods of characterization and classification. Sedimentology, 55(1), 31-63. http://doi.org/10.1111/j.1365-3091.2007.00892.x

Last I looked Blott and Pye (2001) was the most cited paper in ESPL, and is cited in a policy document, a rare occurrence for a geomorphology paper.

‘Sleeping Beauties’ of Geomorphology: a case from the American Journal of Science

Most papers in disciplinary geomorphology journals are cited at some point, but citations to papers do not always accrue immediately upon publication — ideas and papers might take time to be used by researchers and therefore cited. Extreme examples of delayed recognition (‘Sleeping Beauties‘) — where papers recieve no citations for long stretches of time only to recieve a large, late burst in citations — have been identified and investigated previously.

Do geomorphology ‘Sleeping Beauties’ exist? Using the methods of Ke et al. (2015) to find and score ‘Sleeping Beauties’, it turns out that 9 out of the 20 most delayed papers in GSA Bulletin are focused on quantitative geomorphology.

What other papers show this interesting signature of delayed recognition?

I have looked in other journals and found a few neat examples, which I hope to chronicle in a series of posts. Today, I will look at an example from the American Journal of Science (AJS):

The AJS has been published since 1818, and has long been a geology venue. In January 2017  I downloaded the 500 most cited AJS articles from the Web of Science. I used the algorithm presented in Ke et al. (2015) to find the papers with the highest ‘delayed recognition’ score — a ranking of each paper’s citation time series based on the largest, latest peak  (I urge you all to read Ke et al. (2015) which describes the method).

The most delayed paper is about brachiopods, but I want to focus on research related to geomorphology, so let’s look at the 2nd most delayed paper:

W.W.Rubey (1933): Settling velocities of gravel, sand, and silt particles. Am J Sci April 1, 1933 Series 5 Vol. 25:325-338; doi:10.2475/ajs.s5-25.148.325

(n.b., settling velocity has a special place in my heart)

Rubey’s paper has a score that is similar to the papers from GSA Bulletin. Here is the citation time series for the Rubey paper:Rubey CTS.jpg

So the natural quesiton is —  what happened that caused this 2014 burst of citations? As far as I can tell (from looking at the papers that cited Rubey), nothing in particular… Most papers that cite Rubey are focused on typical sediment transport questions. A close read of all the citing papers would be needed to figure out what is going on here, if there is some ‘signal’. Not a satisfying answer, and I apologize —leave a comment if you have an idea and I’ll update the post if I find anything out.

Arduino and Raspberry Pi in Geoscience research

Nature reported last week on the uptick in usage of Arduino and Raspberry Pi for research. The idea of building research tools with open source hardware has been covered before (see Pearce 2012 for an example), but this recent article had a nice plot of the # of papers/year that mention these boards (using PubMed and Scopus) .

After the article last week, I wondered how many Geoscience articles actually use an Arduino or Raspberry Pi….

Using the Web of Science, there are less than 10 articles under the ‘Geosciences Multidisciplinary’, ‘Geology’, and ‘Geography Physical’ topics that use the word ‘Raspberry Pi’ or ‘Arduino’ in the title, key words, or abstract. Not much uptake in the Earth sciences I guess.

Though the articles that use the Arduino are very neat, such as a system for geophone data acquisition, a microscope focus stacker, an earth flow monitoring tool, and temperature-sensing waders.

I have seen other Earth science research using these boards — by attending poster sessions at AGU that highlight low cost tech, and I have read about the Raspberry Shake, which could generate a host of papers in the future…

My interest here comes from dabbling with these two tools in the past. With the Arduino I have actually built a few things, including a primitive Optical Backscatter Sensor (OBS), a datalogger, and an ultrasonic distance sensor (see below; pic from 2014). I hope to get back to that dabbling some day..

FullSizeRender.jpg

US East Coast foredune grasses —quantifying the abundance of literature

Two grasses tend to cover much of the coastal foredunes of the US Atlantic coast. North of the North Carolina/ Virginia area, foredunes are often covered in Ammophila breviligulata (American Beachgrass). South of the NC/VA area, foredunes are often covered in Uniola paniculata (Sea Oats). After my look at how much is written about ‘Coastal Dunes’, I wanted to look at how much is written about these two species. I searched for both of these species — separately — using the Web of Science in early March 2017. Each search is done as a ‘topic’ search, so responses come from paper titles, abstracts and keywords.

Various other plants are present on the shifting sands of East coast foredunes, such as Panicum amarum (Bitter Panic Grass), Spartina patens (Saltmeadow Cordgrass), and Iva imbricata (Dune-marsh elder), to name a few. I included P. amarum in this analysis just for fun.

Shown below is the number of papers written about each species in 5 year bins.

AUP.jpeg

A. breviligulata also grows along the shores of the US ‘Great Lakes’, and the US West coast — I would guess this is the cause of the dominance in  A. breviligulata studies.

  • The ratio of papers per 5 year period for A. breviligulata: U. paniculata: P. Amarum is roughly 5:3:1.
  • The ratio of articles sizes (measured in bytes) on Wikipedia for each of the species is currently 3:2:1.
  • I keep wondering if the ratio of papers about the species reflects the ratio of total shoreline covered the species… or perhaps the ratio of some other abundance metric…

I have a paper in review about some of the geomorphic consequences of these different foredune species.