2

Based on this answer by Dirk Eddelbuettel I am trying to read an xml file from a zip archive for further processing. Apart from URL and filenames the only change to the code referenced is that I changed read.table to xmlInternalTreeParse.

library(XML)
temp <- tempfile()
download.file("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data%2Fnrg_105a.sdmx.zip",temp)
doc <- xmlInternalTreeParse(unz(temp, "nrg_105a.dsd.xml"))
fileunlink(temp)
closeAllConnections()

However, this returns the following error:

Error in file.exists(file) : invalid 'file' argument

traceback()shows that this is a function call from within the parser. So temp seems to be an inappropriate reference in this context. Is there a way to make this work?

Community
  • 1
  • 1
Tungurahua
  • 489
  • 7
  • 21
  • 1
    `xmlInternalTreeParse` doesn't appear to work the same way as `read.table`. Whereas `read.table` can take a connection object, `xmlInternalTreeParse` require a file name (as a character) according to the documentation. – MrFlick Jul 28 '14 at 23:14
  • Hmm, I never really understood what a connection is. So I probably need to convert the connection to a character vector with `readLines` or something similar. – Tungurahua Jul 28 '14 at 23:22

1 Answers1

3

You can try:

# Make a temporary file (tf) and a temporary folder (tdir)
tf <- tempfile(tmpdir = tdir <- tempdir())

## Download the zip file 
download.file("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data%2Fnrg_105a.sdmx.zip", tf)

## Unzip it in the temp folder
xml_files <- unzip(tf, exdir = tdir)

## Parse the first file
doc <- xmlInternalTreeParse(xml_files[1])

## Delete temporary files
unlink(tdir, T, T)
alko989
  • 7,688
  • 5
  • 39
  • 62
  • Great this works. On closer inspection I found that both codes do essentially the same but yours uses `unzip`instead of `unz`. Using the former makes the original script run as well. – Tungurahua Jul 28 '14 at 23:50
  • 2
    The problem is that `xmlInternalTreeParse` needs a filename, not a connection (what `unz` returns). Yes you are right, but it saves the extracted xml in your current directory. – alko989 Jul 29 '14 at 00:02