Trying to move on from my troubles with RISmed (see Problems with RISmed and large(ish) data sets), I decided to use rentrez and entrez_summary to retrieve a large list of pubmed titles from a query:
set_entrez_key("######") #I did provide my real API key here
Sys.getenv("ENTREZ_KEY")
rm(list=ls())
library(rentrez)
query="(United States[AD] AND France[AD] AND 1995:2020[PDAT])"
results<-entrez_search(db="pubmed",term=query,use_history=TRUE)
results
results$web_history
for (seq_start in seq(0, results$count, 100)) {
if (seq_start == 0) {
summary.append.l <- entrez_summary(
db = "pubmed",
web_history = results$web_history,
retmax = 100,
retstart = seq_start
)
}
Sys.sleep(0.1) #slow things down in case THAT'S a factor here....
summary.append.l <- append(
summary.append.l,
entrez_summary(
db = "pubmed",
web_history = results$web_history,
retmax = 100,
retstart = seq_start
)
)
}
The good news...i didn't get a flat out rejection from NCBI like i did with RISMed and EUtilsGet. The bad news...it's not completing. (I get either
Error in curl::curl_fetch_memory(url, handle = handle) :
transfer closed with outstanding read data remaining
or
Error: parse error: premature EOF
(right here) ------^
I almost think there's something about using an affiliation search string in the query, because if I change the query to
query="monoclonal[Title] AND antibody[Title] AND 2010:2020[PDAT]"
it completes the run, despite having about the same number of records to deal with. So...any ideas why a particular search string would result in problems with the NCBI servers?