0

i want to get 'summary_doc' but receive an error. What should i do.

Code is below


--------------------------------------------------------------
library(XML)

base_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/"

db = "pubmed"

query = "human+genome+AND+2014"

esearch = sprintf("esearch.fcgi?db=%s&term=%s",db,query) 

search_url = paste(base_url, esearch, sep="")

search_doc = xmlParse(search_url)

retmax = 9000

new_esearch = sprintf("esearch.fcgi?db=%s&term=%s&retmax=%s",db,query,retmax)

new_search_url= paste(base_url,new_esearch,sep='')

new_search_doc = xmlParse(new_search_url)

ids = xpathSApply(new_search_doc,path="//IdList/Id",fun='xmlValue')

id_list = paste(ids, coppapse=',')

esummary = sprintf("esummary.fcgi?db=%s&id=%s",db, id_list)

sum_url = paste(base_url, esummary, sep='')

summary_doc = xmlParse(sum_url)  #this line makes error

Error: XML content does not seem to be XML: 
user20650
  • 24,654
  • 5
  • 56
  • 91

1 Answers1

2

So there are a couple of things going on here.

First, you misspelled collapse (!!) which causes paste(...) to generate garbage. Did you even look at id_list??

Second, even when you fix that, you are trying to issue a GET request with a query string containing almost 8400 8-character strings concatenated together. This generates a 414 error (Request URI too long). So one way to deal this this is to make multiple smaller requests. I don't recommend that though.

This query length limitation does not apply to POST requests, so you are better off doing it this way. Note the use of GET(...) and POST(...) in the httr package. These functions allow you to avoid the annoying use of sprintf(...) to build queries, and lead to much more readable, reliable, and reproducible code.

library(XML)
library(httr)
url <- "http://eutils.ncbi.nlm.nih.gov"
response <- GET(url,path="entrez/eutils/esearch.fcgi",
                query=list(db="pubmed",term="human genome AND 2014",retmax=9000))
doc      <- content(response,type="text/xml")
ids      <- sapply(doc["//IdList/Id"],xmlValue)

result   <- POST(url,path="entrez/eutils/esummary.fcgi",encode="form",
                 body=list(db="pubmed",id=paste(ids,collapse=",")))
doc      <- content(result,type="text/xml")
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • thank you for your answer. it's really really help for me. but have another question. Q: Can i see the 'doc' file? and one more i really thank you for your answer. – user4329049 Dec 06 '14 at 13:08