-1

I have a folder, where I have several zip files saved, e.g.:

Folder path: C:\Users\FolderA
Files in the folder: A.Zip, B.Zip,....,Z.Zip

this zip files are all protected with the same password: lordoftherings

How Can i load all files in that zip files into one dataframe (note that every zip file contains exactly one csv file).

So far I only know how can i load multiple csv files. And I know how I can load a zip file:

zf = zipfile.ZipFile('C:/...')
dfClearstream = pd.read_csv(zf.open('....csv'), sep=';')

So the desired outcome would be one dataframe in pandas.

PV8
  • 5,799
  • 7
  • 43
  • 87

2 Answers2

1

According to your answer, multiple csv loading example, and open password protected zip file example, you can make codes like below:

If you have A.csv in A.zip, B.csv in B.zip, ...

import glob
import pandas as pd
import zipfile

password = b'lordoftherings' # Set password

zipfiles = glob.glob("C:\Users\FolderA\*.zip") # Get list of zip files
zfs = [(zipfile.ZipFile(f), f.split("\\")[-1].split(".")[0] + '.csv') for f in zipfiles]
# Get ZipFile object and csv file name for each zip file

dfs = [pd.read_csv(zf.open(filename, 'r', password), header=None, sep=';') for zf, filename in zfs]
# Unzip zip file with password, read csv files

salesdata = pd.concat(dfs,ignore_index=True)
Jungho Cho
  • 23
  • 4
  • I get the error: KeyError: "There is no item named 'A_02.csv' in the archive" – PV8 Mar 06 '20 at 07:05
  • You need to specify the structure of your zip file. If the contents of your zip file is like you need to add folder name to the filename while unzipping. – Jungho Cho Mar 07 '20 at 10:59
0

You can create a list with dataframes and the concat them with:

dfs = []

with zipfile.ZipFile('my_zip.zip') as zf:
    for file in zf.namelist():
        dfs.append(pd.read_csv(zf.open(file), sep=';'))

df = pd.concat(dfs)
villoro
  • 1,469
  • 1
  • 11
  • 14