0

I have a medium sized archive (zipped roughly 140Mb). I need to access single xml-files only, as there are roughly 50.000 xml-files, I want to use the unz-function to extract a single xml-file instead of unzipping the whole folder.

Using the approach proposed here, my code looks like this:

library(XML)
f.path <- "path to zip-archive/"
# establish a connection to the file
dat <- unz(paste0(f.path, "BKK-Download.zip"), filename = "lists/www_s100_bh8285_1_3.xml")
# trying to parse the xml code
xml.content <- xmlParse(dat)

# which returns
# Error in file.exists(file) : invalid 'file' argument

This question is very similar to this, but I want to extract only a single file instead of unzipping the 50k files.

Any ideas of how to fix this? Any help is much appreciated!

Appendix: You can find the dataset here (direct link) (source: Deutsche Bundesbank (website))

Community
  • 1
  • 1
David
  • 9,216
  • 4
  • 45
  • 78
  • sorry, that was a typo, it is indeed paste0 in my script. – David Sep 25 '15 at 16:50
  • True, but the error comes up when I call `xmlParse`, the connection `dat` looks fine! (i.e., tested with a .csv file and read.data. Works fine, but not xml.) – David Sep 25 '15 at 16:52
  • The path seperator in this case is in `f.path`, it ends with a "/".... – David Sep 25 '15 at 16:53
  • 1
    You may need to save as a `gzip` and use `gzcon` as `help(xmlParse)` doesn't mention any other connections can be used in its `file` argument – Rich Scriven Sep 25 '15 at 16:57
  • What about `xmlParse(readLines(con = dat))`? – lukeA Sep 25 '15 at 17:15
  • @lukeA, good idea, however, it returns: `Error in readLines(con = dat) : cannot open the connection In addition: Warning message: In readLines(con = dat) : cannot open zip file ` on my machine! – David Sep 25 '15 at 17:20
  • Cannot reproduce the error, works fine here. – lukeA Sep 25 '15 at 17:43
  • @lukeA. If I try to replicate the error with another .zip archive it works fine! However, with the "Bundesbank" zip archive, it doesn't work, returning the error as mentioned before. Did you try to reproduce the error with another zip or with the "Bundesbank"-file? – David Sep 26 '15 at 12:18
  • @David with "Bundesbank" – lukeA Sep 26 '15 at 12:22
  • @lukA, ok. Thank you for your help! I "rezipped" the files again and now everything works fine with the Bundesbank archive. As I don't have admin rights on the machine, I assume it has to do something with rights... Again, thanks for the effort! :) – David Sep 26 '15 at 12:41

0 Answers0