I'm working on a project to analyse how journal articles are cited. I have a large file of journal article names. I intend to pass them to Google Scholar and see how many citations each has.
Here is the strategy I am following:
Use "scholar.py" from http://www.icir.org/christian/scholar.html. This is a pre written python script that searches google scholar and returns information on the first hit in CSV format (including number of citations)
Google scholar blocks you after a certain number of searches (I have roughly 3000 article titles to query). I have found that most people use Tor ( How to make urllib2 requests through Tor in Python? and Prevent Custom Web Crawler from being blocked) to solve this problem. Tor is a service that gives you a random IP address every few minutes.
I have scholar.py and tor both successfully set up and working. I'm not very familiar with python or the library urllib2 and wonder what modifications are needed to scholar.py so that queries are routed through Tor.
I am also amenable to suggestions for an easier (and potentially considerably different) approach for mass google scholar queries if one exists.
Thanks in advance