3

I am trying to request an XML document with two different methods (xmlParse and httr::GET) and expect the response to be the same. The response I get with xmlParse is what I expect but with httr::GET my request URL gets truncated at some point.

An example:

require(httr)
require(XML)
require(rvest)

term <- "alopecia areata"
request <- paste0("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=",term)  

#requesting URL with XML
xml_response <- xmlParse(request)

xml_response %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

This returns, as it should

[1] "alopecia areata"        

Now for httr

httr_response <- GET(request)
httr_content <- content(httr_response)

httr_content %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

This returns

[1] "alopecia"

What's interesting: if we check the httr_response element for the requested URL, it's correct. Only the response is wrong.

> httr_response$request$opts$url

[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=alopecia areata"

> httr_response$url

[1] "http://eutils.ncbi.nlm.nih.gov/gquery?term=alopecia&retmode=xml"

So at some point my query term got truncated. If the whole request is put into a browser by hand, it behaves as expected.

Any suggestions how to resolve this would be would be greatly appreciated.

1 Answers1

5

You can try replacing the space in your URL by a+ to prevent it from being truncated:

httr_response <- GET(gsub(" ","+",request))
httr_content <- content(httr_response)

httr_content %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

#[1] "alopecia areata"

More info about spaces and URLs here

Community
  • 1
  • 1
NicE
  • 21,165
  • 3
  • 51
  • 68