I have a list of titles of academic papers that I need to download. I would like to write a loop to download their PDF files from the web, but can't find a way to do it.
Here is the step-by-step of what I've thought so far (The answer is welcomed to be in R or Python):
# Create list with paper titles (example with 4 papers from different journals)
titles <- c("Effect of interfacial properties on polymer–nanocrystal thermoelectric transport",
"Reducing social and environmental impacts of urban freight transport: A review of some major cities",
"Using Lorenz curves to assess public transport equity",
"Green infrastructure: The effects of urban rail transit on air quality")
#Loop step1 - Query paper title in Google Scholar to get URL of journal webpage containing the paper
#Loop step2 - Download the PDF from the journal webpage and save in your computer
for (i in titles){
journal_URL <- query i in google (scholar)
download.file (url = journal_URL, pattern = "pdf",
destfile=paste0(i,".pdf")
}
Complicators:
Loop step1 - The first hit of Google Scholar should be the paper's original URL. However, I've heard Google Scholar is a bit fussy with Bots, so the alternative would be to query Google and get the first URL (hopping it will bring the correct URL)
Loop step2 - Some papers are gated, so I imagine that it would be necessary to include authentication info (user=__ , passwd=__). If I am using my university network, though, this authentication should be automatic, right?
ps. I only need to download the PDF. I'm not interested in getting bibliometric information (e.g. citation records, h-index). For getting bibliometric data, there is some guidance here (R users) and here (python users).