1

Trying to move on from my troubles with RISmed (see Problems with RISmed and large(ish) data sets), I decided to use rentrez and entrez_summary to retrieve a large list of pubmed titles from a query:

set_entrez_key("######") #I did provide my real API key here
Sys.getenv("ENTREZ_KEY")
rm(list=ls())
library(rentrez)
query="(United States[AD] AND France[AD] AND 1995:2020[PDAT])"
results<-entrez_search(db="pubmed",term=query,use_history=TRUE)
results
results$web_history
for (seq_start in seq(0, results$count, 100)) {
    if (seq_start == 0) {
        summary.append.l <- entrez_summary(
            db = "pubmed", 
            web_history = results$web_history, 
            retmax = 100, 
            retstart = seq_start
        )
    } 
    Sys.sleep(0.1) #slow things down in case THAT'S a factor here....
    summary.append.l <- append(
        summary.append.l,
        entrez_summary(
            db = "pubmed", 
            web_history = results$web_history, 
            retmax = 100, 
            retstart = seq_start
        )
    )
}

The good news...i didn't get a flat out rejection from NCBI like i did with RISMed and EUtilsGet. The bad news...it's not completing. (I get either

Error in curl::curl_fetch_memory(url, handle = handle) : 
  transfer closed with outstanding read data remaining

or

Error: parse error: premature EOF
                                       
                     (right here) ------^

I almost think there's something about using an affiliation search string in the query, because if I change the query to

query="monoclonal[Title] AND antibody[Title] AND 2010:2020[PDAT]"

it completes the run, despite having about the same number of records to deal with. So...any ideas why a particular search string would result in problems with the NCBI servers?

0 Answers0