I'm a graduate student in economics and I'm currently working on a research project that involves Google Scholar. Though economists usually use Stata, the access to Google Scholar is made easier via R, so I've been learning how R works for the past week. Needless to say I'm a beginner and there are loads of things I don't really understand.
I managed to webscrape a list of economists and to generate a random sample from this list. I would now like to get some Google Scholar information concerning these academics. To do so, I plan on using the library 'scholar'.
My problem is that 'scholar' asks for Google Scholar IDs. I only have the name of the economists, so I would like to retrieve their IDs.
I basically want to do a google scholar query for each economist: https://scholar.google.fr/scholar?hl=fr&as_sdt=0%2C5&q="NAME OF THE ECONOMIST" and find in the html code the google scholar ID.
I tried with economist "Emmanuel Saez" to get started: https://scholar.google.fr/scholar?hl=fr&as_sdt=0%2C5&q=Emmanuel+Saez&btnG=
The relevant css node is: ".gs_rt2", so my code looks like:
page <- read_html("https://scholar.google.fr/scholar?hl=fr&as_sdt=0%2C5&q=Emmanuel+Saez&btnG=")
text <- html_nodes(page, ".gs_rt2")
The object "text" looks something like that:
[1] <h4 class="gs_rt2"><a href="/citations?user=qZpr_CQAAAAJ&hl=fr&oe=ASCII&oi=ao"><b...
I'm just missing the last part: how do I tell R to select just the 12-char code after "user=" ?
It must be pretty obvious, but I just can't figure out how to do it. If someone can help me out that would be great.
Thanks, G. Gauthier