1

I am using the below method to read in entire folder of csv's. These csv's are dropped in a folder every day and reflect activity that has occurred. When no activity occurs the csv is blank and still has a size of 1KB. How can I use this script to skip empty files that still have a file size? Currently I get the error:

EmptyDataError: No columns to parse from file

Current code:

os.chdir('file_path')
file_extension = '.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
df = pd.concat([pd.read_csv(t) for t in all_filenames], ignore_index=True,sort=False,axis=0)

2 Answers2

1

You can always handle the exception and generate a list of dataframes that you will feed into the concat function. There's probably a way to write this as an oneliner which I don't know. So here's the long version:

os.chdir('file_path')
file_extension = '.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]

all_dataframes = []
for t in all_filenames:
    try:
        df = pd.read_csv(t)
        all_dataframes.append(df)
    except pd.io.common.EmptyDataError:
        print("empty csv encountered")
        # if for some reason you would prefer an empty dataframe
        #df = pd.DataFrame()
        #all_dataframes.append(df)

df = pd.concat(all_dataframes, ignore_index=True,sort=False,axis=0)
Sakib Abrar
  • 161
  • 10
0

Following the accepted answer for How to check whether a file is empty or not, we can know if a file is truly empty and just ignore it.

I assume your files look like this:

file1.csv
=========
Col1,Col2
a,1
b,2

file2.csv
=========

file3.csv
=========
Col1,Col2
c,3
d,4

file2.csv is blank, and does not have headers, otherwise Pandas would not throw the "No columns to parse from file" exception.

import glob
import os

import pandas as pd

file_extension = ".csv"

all_filenames = []
for csv_file in glob.glob(f"*{file_extension}"):
    if os.stat(csv_file).st_size == 0:
        continue
    all_filenames.append(csv_file)

df = pd.concat([pd.read_csv(t) for t in all_filenames], ignore_index=True, sort=False, axis=0)

and I get:

  Col1  Col2
0    a     1
1    b     2
2    c     3
3    d     4
Zach Young
  • 10,137
  • 4
  • 32
  • 53