1

trying to download this zipped file from the cdc with R. it works fine from firefox.. so i tried setInternet2(TRUE) right away, but that still didn't work..

in every case below, i get:

z<-unzip(tf)

Warning message:
In unzip(tf) : zip file is corrupt

here are the starting two lines for all of my attempts--

fn <- 'ftp://ftp.cdc.gov/pub/health_statistics/nchs/datasets/dvs/natality/nat2012us.zip'
tf <- tempfile() ; td <- tempdir()

and here's what i tried:

# fails
download.file(fn,tf,mode='wb')
z <- unzip( tf , exdir = td )

# fails
setInternet2(TRUE)
download.file(fn,tf,mode='wb')
z <- unzip( tf , exdir = td )

# fails
download.file(fn,tf,mode='wb',cacheOK=FALSE)
z <- unzip( tf , exdir = td )

# fails
setInternet2(TRUE)
download.file(fn,tf,mode='wb',cacheOK=FALSE)
z <- unzip( tf , exdir = td )

# fails
library(downloader)
download(fn,tf,mode='wb')
z <- unzip( tf , exdir = td )

# fails
library(httr)
resp <- GET(fn)
writeBin(content(resp, "raw"), tf)

# fails
library(RCurl)
x <- getBinaryURL( fn )
writeBin( x , tf )
z <- unzip(tf)


# in every case:
> file.info(tf)$size
[1] 228799759

sorry if it's something stupid

Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • Have you got a smaller example file it fails on? Or a smaller example it works on? 200Mb is a bit of a download... Also a big WARNING 200Mb file!!! would have been appreciated! – Spacedman Apr 19 '14 at 10:25
  • If you download with R and try and unzip in Windows does Windows say its corrupt? In other words, is it R's download that's getting it wrong or R's unzip? Zip compression can have many compression algorithms, and maybe R's unzip doesn't have the One. – Spacedman Apr 19 '14 at 10:32
  • @Spacedman i don't sorry. this problem appears to be windows compression in R, and it's similar to http://stackoverflow.com/questions/16096192/how-to-programmatically-extract-or-unzip-a-7z-7-zip-file-with-r – Anthony Damico Apr 19 '14 at 10:34
  • Yeah, help(unzip) is pretty confessional about how dysfunctional it is. – Spacedman Apr 19 '14 at 10:35

1 Answers1

2

looks windows unzip="internal" is the problem. shell() and winrar work around the problem

fn <- 'ftp://ftp.cdc.gov/pub/health_statistics/nchs/datasets/dvs/natality/nat2012us.zip'
tf <- tempfile()
download.file( fn , tf , mode = 'wb' )

wr <- normalizePath( "C:/Program Files/WinRAR/WinRAR.exe" )

td <- tempdir()
shell( paste0( '"' , wr , '" x ' , tf , ' ' , td ) )
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77