1

I'm trying to download and decompress a gzip file and then convert the resulting decompressed file which is of tsv format into a CSV format which would be easier to parse. I am trying to gather the data from the "Download Table" link in this URL. My code is as follows, where I am using the same idea as in this post, however I get the error IOError: Not a gzipped file in the line outfile.write(decompressedFile.read()). My code is as follows:

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?"
filename = "D:\Sidney\irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format
Community
  • 1
  • 1
user131983
  • 3,787
  • 4
  • 27
  • 42
  • You should probably output the first few bytes of the `compressedFile` and check it actually looks like a zip file. There's a few things that may be going on here, one possibility is that server is giving you an error page because your download request is missing a request param, or cookie, or it doesn't like the useragent. As a side note, I would highly recommend you look at using the Requests package (http://docs.python-requests.org/en/latest/) instead of urllib2. – Tom Dalton Jun 16 '15 at 15:28
  • Use a raw string for Windows paths: `filename = r"D:\Sidney\irt_euryld_d.tsv.gz"`. Won't make any difference here, but a general comment for safety. – cdarke Jun 16 '15 at 15:38
  • @cdarke Thanks. However, I still get the error. – user131983 Jun 16 '15 at 15:43

1 Answers1

3

basically you try to pull a wrong file when checking the response in your code you get an html page of an error you are trying to add your own path to the url which leads to a wrong url

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename)
print response
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format

the difference is the line for filename and a small addition to the baseURL filename = "data/irt_euryld_d.tsv.gz" which is the correct file name according to the link you specified

the other change is this line outFilePath = filename.split('/')[1][:-3]

which could be better written as

outFilePath = os.join('D:','Sidney',filename.split('/')[1][:-3])
Srgrn
  • 1,770
  • 16
  • 30
  • Thank You. Just one question, did you mean `outFilePath` instead of `outFileName` in `outFileName = os.join('D:','Sidney',filename.split('/')[1][:-3])`. – user131983 Jun 16 '15 at 15:54