Read multiple csv files zipped in one file

Question

I have several csv files in several zip files in on folder, so for example:

A.zip (contains csv1,csv2,csv3)
B.zip (contains csv4, csv5, csv6)

which are in the folder path C:/Folder/, when I load normal csv files in a folder I use the following code:

import glob
import pandas as pd
files = glob.glob("C/folder/*.csv")
dfs = [pd.read_csv(f, header=None, sep=";") for f in files]

df = pd.concat(dfs,ignore_index=True)

followed by this post: Reading csv zipped files in python

One csv in zip works like this:

import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip') 
df = pd.read_csv(zf.open('intfile.csv'))

Any idea how to optimize this loop for me?

[ZipFile.namelist()](https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.namelist)` should give you list of all file inside .zip so you can use this list in `for-loop` to read .csv from .zip — furas, Jun 27 '19 at 07:56

Rakesh · Accepted Answer · 2019-06-27T08:03:49.350

7

Use zip.namelist() to get list of files inside the zip

Ex:

import glob
import zipfile
import pandas as pd

for zip_file in glob.glob("C/folder/*.zip"):
    zf = zipfile.ZipFile(zip_file)
    dfs = [pd.read_csv(zf.open(f), header=None, sep=";") for f in zf.namelist()]
    df = pd.concat(dfs,ignore_index=True)
    print(df)

edited Jun 27 '19 at 08:03

answered Jun 27 '19 at 07:59

Rakesh

81,458
17
76
113

I have severals Zipfiles, not only one – PV8 Jun 27 '19 at 08:00
1

In that case just loop the list of zip files first? – Rakesh Jun 27 '19 at 08:02

score 1 · Answer 2 · answered Jun 27 '19 at 07:56

I would try to tackle it in two passes. First pass, extract the contents of the zipfile onto the filesystem. Second Pass, read all those extracted CSVs using the method you already have above:

import glob
import pandas as pd
import zipfile

def extract_files(file_path):
  archive = zipfile.ZipFile(file_path, 'r') 
  unzipped_path = archive.extractall()
  return unzipped_path

zipped_files = glob.glob("C/folder/*.zip")]
file_paths = [extract_files(zf) for zf in zipped_files]

dfs = [pd.read_csv(f, header=None, sep=";") for f in file_paths]
df = pd.concat(dfs,ignore_index=True)

Read multiple csv files zipped in one file

2 Answers2

Linked