Correlation between article downloads and citations (in ESurf)

Previous work has established a correlation between article downloads and citation (e.g., Perneger, 2004; Brody et al., 2006; Yan and Gerstein, 2011). The only exclusively Earth science journals I know that make article level metrics viewable are published by Copernicus (see here for info on their article level metrics).

To test for correlation between citation data and article level metrics (in a variety of forms), I downloaded (from the Web of Science) the 2015 cumulative citation counts for all articles published in Earth Surface Dynamics (ESurf) during 2013 and 2014. Admitedly this is a small sample (45 papers). I compared these citation records with the cumulative article level metrics for each article (through December 2015).

The article level metrics are broken into three categories: pdf downloads, xml downloads and html views. For a given paper, pdf downloads account for an average of 42% of total engagements, html views are 52%, and xml downloads are 5%.

Here is what the data look like:

Citations vs all engagement (combined html, pdf and xml):


Citations vs html views:


Citations vs pdf downloads:


finally, Citations vs xml downloads:


The correlation cofficient (r) for these plots:

  • 0.64 for citations and all engagements
  • 0.49 for citations and html views
  • 0.70 for citations and pdf downloads
  • 0.25 for citations and xml donwloads

In this data, article level metrics tend to be correlated with article citations. Some large outliers for html views are likely because this dataset spans the first few ESurf papers, and we were all checking out 1) how the manuscripts looked online; 2) the typsetting; and 3) the open review format..

Longer data records (perhaps from other Copernicus journals) will help to firm up these correlations. However, there is currently no way (that I know of), to obtain the Copernicus data without a lot of manually work —  i.e., there is no API like PLoS.

A general reference for this work has been:

Haustein S (2014) Readership metrics. In: Cronin B, Sugimoto C, editors. Beyond Bibliometrics: Harnessing Multi-dimensional Indicators of Performance. Cambridge, MA: MIT Press, 2014 (note: This whole book is an great reference)

(again, plots here were made using Tufte in R)



