0

How can I create the pandas DataFrame from csv file that's compressed in tar.gz? I found this code which does that but with zip file. What should I change in the following code to make it work with tar.gz without downloading the tar.gz and csv file.

import pandas, requests, zipfile, StringIO
r =requests.get('http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip')
z = zipfile.ZipFile(StringIO.StringIO(r.content))
df=pandas.read_csv(z.open('sample_CSV.csv'))

My file is https://ghtstorage.blob.core.windows.net/downloads/mysql-2016-06-16.tar.gz

Geet
  • 2,515
  • 2
  • 19
  • 42
  • 1
    Note that the zipped file is almost 40gb. This will be unzipped and loaded into memory. How much RAM do you have? – ayhan Aug 28 '16 at 12:42

2 Answers2

1

Can you try below for extracting tar.gz as below :

import tarfile
tar = tarfile.open(fname, "r:gz")
tar.extractall()
tar.close()
Raghav Garg
  • 33
  • 2
  • 7
0

Try simply supply your .tar.gz file as the file name
to read_csv and it will automatically decompress and open it,
since this is the default behavior for gz files.

Make sure the extension is in lower case.

Israel Unterman
  • 13,158
  • 4
  • 28
  • 35