20

I have a list of authors. I wish to automatically retrieve/calculate the (ideally yearly) citation index (h-index, m-quotient,g-index, HCP indicator or ...) for each author.

Author Year Index
first  2000   1
first  2001   2
first  2002   3

I can calculate all of these metrics given the citation counts for each paper of each researcher.

Author Paper Year Citation_count
first    1    2000   1
first    2    2000   2
first    3    2002   3

Despite my efforts, I have not found an API/scraping method capable of this.

My institution has access to a number of services including Web of Science.

Etienne Low-Décarie
  • 13,063
  • 17
  • 65
  • 87
  • 3
    http://bmb-common.blogspot.ca/2011/11/google-scholar-still-sucks.html has some information -- in particular, the `CITAN` package looks quite powerful if you have access to Scopus; there have also been some recent PubMed-scraping posts on r-bloggers (whether this works for you or not depends on whether you are happy with PubMed coverage in your field). Even if you could scrape WoS, it's not permitted by their terms of service ... – Ben Bolker May 10 '12 at 15:02
  • @Ben Bolker, Thank you for the suggestions, this does point me in the right direction. – Etienne Low-Décarie May 10 '12 at 15:06
  • This is probably where a solution will be created: http://ropensci.org/project-overview/ – Etienne Low-Décarie May 10 '12 at 15:13
  • 1
    https://github.com/ropensci/raltmet/blob/master/R/citedin.r – Etienne Low-Décarie May 10 '12 at 17:50
  • 1
    All useful information, thanks for digging it out (if you put together an answer from these bits and pieces it would be great to post it here as an answer to your question). Still very much restricted by the data sources (e.g. PubMed), but things are developing in a useful way. – Ben Bolker May 10 '12 at 18:15
  • http://simplystatistics.tumblr.com/post/13203811645/an-r-function-to-analyze-your-google-scholar-citations – Etienne Low-Décarie May 23 '12 at 15:04
  • Jeff Leek, Roger Peng, and Rafa Irizarry produced functions to tie in to google scholar. http://simplystatistics.tumblr.com/post/13203811645/an-r-function-to-analyze-your-google-scholar-citations – Etienne Low-Décarie May 31 '12 at 02:33
  • 1
    those are nice, but note that they tie into Google scholar *citations* -- i.e. into the page you can pull up with your own citation report, not a general purpose search (I think) – Ben Bolker May 31 '12 at 06:45

1 Answers1

1

Effectively the main problem is to build the citation graph. Once you have that you can compute any metrics you want (e.g. h-index, g-index, PageRank).

Supposing you have a collections of papers (that you've retrieved in some way) you can extract the citations from each of them and build the citation graph. You might find useful ParsCit, an open-source CRF Reference String and Logical Document Structure Parsing Package which is also used by CiteSeerX and works great.

Leonardo
  • 2,065
  • 2
  • 26
  • 27