Using R to download zipped data file, extract, and import .csv

Question

I am trying to download and extract a .csv file from a webpage using R.

This question is a duplicate of Using R to download zipped data file, extract, and import data.

I cannot get the solution to work, but it may be due to the web address i am using.

I am trying to download the .csv files from http://data.worldbank.org/country/united-kingdom (under the download data drop down)

Using @Dirk's solution from the link above, i tried

temp <- tempfile()
download.file("http://api.worldbank.org/v2/en/country/gbr?downloadformat=csv",temp)
con <- unz(temp, "gbr_Country_en_csv_v2.csv")
dat <- read.table(con, header=T, skip=2)
unlink(temp)

I got the extended link by looking at the page source code, which I expect is causing the problems, although it works if i paste it into the address bar.

The file downloads with the correct Gb

download.file("http://api.worldbank.org/v2/en/country/gbr?downloadformat=csv",temp)
# trying URL 'http://api.worldbank.org/v2/en/country/gbr?downloadformat=csv'
# Content type 'application/zip' length 332358 bytes (324 Kb)
# opened URL
# downloaded 324 Kb

# also tried unzip but get this warning
con <- unzip(temp, "gbr_Country_en_csv_v2.csv")
# Warning message:
# In unzip(temp, "gbr_Country_en_csv_v2.csv") :
# requested file not found in the zip file

But these are the file names when i manually download them.

I'd appreciate some help with where i am going wrong , thanks

I am using Windows 8, R version 3.1.0

score 23 · Accepted Answer · edited May 27 '14 at 22:32

23

In order to get your data to download and uncompress, you need to set mode="wb"

download.file("...",temp, mode="wb")
unzip(temp, "gbr_Country_en_csv_v2.csv")
dd <- read.table("gbr_Country_en_csv_v2.csv", sep=",",skip=2, header=T)

It looks like the default is "w" which assumes a text files. If it was a plain csv file this would be fine. But since it's compressed, it's a binary file, hence the "wb". Without the "wb" part, you can't open the zip at all.

edited May 27 '14 at 22:32

thelatemail

91,185
12
128
188

answered May 27 '14 at 22:08

MrFlick

195,160
17
277
295

I've been struggling with the exact same question for hours, thanks for your answer and the clear-cut explanation... Worked for me. – Pavithra Gunasekara Sep 19 '14 at 15:12

score 6 · Answer 2 · answered May 27 '14 at 22:03

6

It's almost everything ok. In this case you only need to specify that it's a comma separated file, eg using sep="," in read.table:

temp <- tempfile()
download.file("http://api.worldbank.org/v2/en/country/gbr?downloadformat=csv", 
              temp)
con <- unz(temp, "gbr_Country_en_csv_v2.csv")
dat <- read.table(con, header=T, skip=2, sep=",")
unlink(temp)

With this little change i can import your csv smoothly.

HTH, Luca

answered May 27 '14 at 22:03

Luca Braglia

3,133
1
16
21

1

Thanks Luca, i do need the separator. However, it is failing at the `unz` or `unzip` stage. At `read.table` it still cannot locate the file. Did this above codes work for you? – user2957945 May 27 '14 at 22:12
@MrFlick Debian Gnu/Linux – Luca Braglia May 27 '14 at 22:16
@LucaBraglia OK. I think Windows has a different default or the file was crossing OS boundaries and Windows was getting goofed up trying to translate line endings. – MrFlick May 27 '14 at 22:18

G. Grothendieck · Answer 3 · 2014-05-27T22:47:55.413

5

The Word Bank Developmet Indictors can be obtained using the WDI package. For example,

library(WDI)
inds <- WDIsearch(field = "indicator")[, 1]
GB <- WDI("GB", indicator = inds)

See WDIsearch and WDI functions and the rerference manual for more info.

edited May 27 '14 at 22:47

answered May 27 '14 at 22:19

G. Grothendieck

254,981
17
203
341

Using R to download zipped data file, extract, and import .csv

3 Answers3

Linked