3

Running R 3.2.0, R Studio 0.99.441, Windows 7 32-bit, XML package 3.98-1.2

I am trying to read a XML file from the site below using XML package, and xmlTreeParse but keep getting an error.

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

> fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
> doc <- xmlTreeParse(fileURL, useInternal = TRUE)
Error: XML content does not seem to be XML: 'https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml' 

I also tried download.file () with xmlTreeParse

download.file(fileURL, destfile = "data.xml")
doc <- xmlTreeParse("data.xml", useInternalNodes = TRUE)

When I do this there is no immediate error but the varibale 'doc' has no structure and I'm not sure how to read it from this point.

Matt Boudas
  • 31
  • 1
  • 2
  • Because of the https try to add `method = "curl"` argument to the `download.file` function. The problems should be there. I have a Mac, I can't see the differences between curl and no curl because Mac works only with the curl options. Let us know if it works. – SabDeM Jun 08 '15 at 16:25
  • > download.file(fileURL, destfile = "data.xml", method = "curl") Warning messages: 1: running command 'curl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml" -o "data.xml"' had status 127 2: In download.file(fileURL, destfile = "data.xml", method = "curl") : download had nonzero exit status I still have to structure when i run my rootNode[[1]] I get back the entire xml document – Matt Boudas Jun 08 '15 at 16:34

2 Answers2

1

Remove s from https :

fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
fileURL <- sub('https', 'http', fileURL)
doc <- htmlParse(fileURL)
agstudy
  • 119,832
  • 17
  • 199
  • 261
0

This worked for me:

library(XML)
fileURL <- "https://www.w3schools.com/xml/simple.xml"
download.file(fileURL, destfile = "data.xml", method = "curl")
doc <- xmlTreeParse("data.xml", useInternalNodes = TRUE)
rootNode <- xmlRoot(doc)
Math Expert
  • 181
  • 2
  • 3
  • 11