I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?
-
See my answer here [without downloading zip files] https://stackoverflow.com/a/45771620/348168 – Vinod Aug 19 '17 at 12:51
10 Answers
I used the zipfile
module to import the ZIP directly to pandas dataframe.
Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":
import pandas as pd
import zipfile
zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip')
df = pd.read_csv(zf.open('intfile.csv'))
If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:
import csv
from io import TextIOWrapper
from zipfile import ZipFile
with ZipFile('yourfile.zip') as zf:
with zf.open('your_csv_inside_zip.csv', 'r') as infile:
reader = csv.reader(TextIOWrapper(infile, 'utf-8'))
for row in reader:
# process the CSV here
print(row)
-
5I tried doing this not realizing that I needed io.TextIOWrapper. How could I have known? – Ken Ingram Jul 21 '20 at 12:14
-
1@KenIngram ZipFile.open() give a zipfile.ZipExtFile object. The built-in function open() function returns a _io.TextIOWrapper object directly – Dimitri_Fu Jul 14 '21 at 19:07
A quick solution can be using below code!
import pandas as pd
#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")

- 1,162
- 1
- 10
- 11
-
1Outstanding answer! I check that using this same solution without the ".csv" extension also works: `df = pd.read_csv("/path/to/file.zip")` – Gian Arauz Mar 04 '21 at 14:46
zipfile also supports the with statement.
So adding onto yaron's answer of using pandas:
with zipfile.ZipFile('file.zip') as myZip:
with myZip.open('file.csv') as myZipCsv:
df = pd.read_csv(myZipCsv)

- 164
- 1
- 8

- 109
- 1
- 2
Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:
import os
import pandas as pd
import zipfile
curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []
print ("Uncompressing and reading data... ")
for text_file in text_files:
print(text_file.filename)
df = pd.read_csv(zf.open(text_file.filename))
# do df manipulations
list_.append(df)
df = pd.concat(list_)

- 4,286
- 8
- 15
- 33

- 4,363
- 3
- 21
- 31
Yes. You want the module 'zipfile'
You open the zip file itself with zipfile.ZipInfo([filename[, date_time]])
You can then use ZipFile.infolist()
to enumerate each file within the zip, and extract it with ZipFile.open(name[, mode[, pwd]])

- 593
- 3
- 9
this is the simplest thing I always use.
import pandas as pd
df = pd.read_csv("Train.zip",compression='zip')

- 7,940
- 9
- 38
- 57

- 59
- 1
- 2
Supposing you are downloading a zip file that contains a CSV and you don't want to use temporary storage. Here is what a sample implementation looks like:
#!/usr/bin/env python3
from csv import DictReader
from io import TextIOWrapper, BytesIO
from zipfile import ZipFile
import requests
def all_tickers():
url = "https://simfin.com/api/bulk/bulk.php?dataset=industries&variant=null"
r = requests.get(url)
zip_ref = ZipFile(BytesIO(r.content))
for name in zip_ref.namelist():
print(name)
with zip_ref.open(name) as file_contents:
reader = DictReader(TextIOWrapper(file_contents, 'utf-8'), delimiter=';')
for item in reader:
print(item)
This takes care of all python3 bytes/str issues.

- 47,733
- 20
- 85
- 108
-
This is one of those answers which handles in-memory zips. None other does – Joy Sep 02 '22 at 14:01
Modern Pandas since version 0.18.1 natively supports compressed csv files: its read_csv method has compression parameter : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

- 2,011
- 24
- 27
If you have a file name: my_big_file.csv
and you zip it with the same name my_big_file.zip
you may simply do this:
df = pd.read_csv("my_big_file.zip")
Note: check your pandas version first (not applicable for older versions)

- 10,437
- 12
- 58
- 94