How to download search results on google scholar using r?

Question

I would like extract the first 100 results (say) of a Google Scholar search using R. Does anyone know how to do it?

To be precise, I just need the name of the paper, authors and citation count.

Ps Would this be legal?

It looks like Google scholar is lacking a [nice API](http://code.google.com/p/google-ajax-apis/issues/detail?id=109&colspec=ID%20Type%20Stars%20Status%20Modified%20Summary%20APIType%20Opened) — csgillespie, Feb 15 '11 at 16:54
Re your PS: I have looked at the "about" page (http://scholar.google.ca/intl/en/scholar/about.html) and don't see any explicit terms of use — Ben Bolker, Feb 15 '11 at 19:34
Also http://tonybreyal.wordpress.com/2011/11/08/web-scraping-google-scholar-partial-success/ — Ben Bolker, Nov 09 '11 at 15:41
And the update: http://tonybreyal.wordpress.com/2011/11/08/web-scraping-google-scholar-part-2-complete-success/ — Ben Bolker, Nov 09 '11 at 21:45
Not a strict answer, but I'd suggest learning Python for web scraping tasks. Even if you don't plan on using it for statistical programming, it's just a lot easier for scraping in my experience and has more references you can use. I spent the time to learn it on top of R, and definitely don't think that was time wasted. — verybadatthis, Jun 28 '16 at 21:00

score 6 · Accepted Answer · answered Nov 09 '11 at 14:06

6

please consider the updated biobucket-post:

http://thebiobucket.blogspot.com/2011/11/r-function-google-scholar-webscraper.html

answered Nov 09 '11 at 14:06

Kay

2,702
6
32
48

sry, the script on theBioBucket is outdated due to changes on GoogleScholar - no idea when I get a chance to fix it.. – Kay Jun 28 '12 at 09:41

score 4 · Answer 2 · answered Feb 15 '11 at 19:33

4

There are some Python and Perl scrapers out there that you might be able to adapt, linked at http://bmb-common.blogspot.com/2011/02/does-google-scholar-suck-or-am-i-just.html

answered Feb 15 '11 at 19:33

Ben Bolker

211,554
25
370
453

score 3 · Answer 3 · edited May 23 '17 at 10:31

I can't speak to the legalities of your task, but there are a few ways you can go about this. While I am not strong in XPath, it might be the best way. I believe that you can use the XML package to retrieve the page contents and use XPath to extract the data of the elements you need.

For instance, I use Chrome for a browser, and when I inspected the page with Developer Tools, there does appear to be a structure to the page, with the data "hidden" inside various tags that should you be able to exploit really easily using XPath.

Check out this link for an example of using XPath.

HTH and Good Luck

score 3 · Answer 4 · answered Feb 15 '11 at 17:37

You can definitely retrieve the HTML content of the page using RCurl and parse them using RXML as suggested by Btibert3. The only issue you might face is that Google won't allow you to do queries in a "robotic" way. After something like 200 queries in Google in a short period of time, it won't return results anymore. Maybe that's different with Google Scholar, but I doubt so...

score 1 · Answer 5 · answered Nov 05 '11 at 10:28

1

A solution was recently published here:

http://thebiobucket.blogspot.com/2011/11/visually-examine-google-scholar-search.html

answered Nov 05 '11 at 10:28

Tal Galili

24,605
44
129
187

How to download search results on google scholar using r?

5 Answers5

Linked