How to read in entire folder of csv's and skip items that do not have any columns

Question

I am using the below method to read in entire folder of csv's. These csv's are dropped in a folder every day and reflect activity that has occurred. When no activity occurs the csv is blank and still has a size of 1KB. How can I use this script to skip empty files that still have a file size? Currently I get the error:

EmptyDataError: No columns to parse from file

Current code:

os.chdir('file_path')
file_extension = '.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
df = pd.concat([pd.read_csv(t) for t in all_filenames], ignore_index=True,sort=False,axis=0)

Michael, do the .csv files contain a header row? Does each row have multiple columns of data? — dtadams79, Sep 15 '22 at 15:27
Like what @dtadams79 is getting at, what does an empty file look like? Please edit your post and include a sample. — Zach Young, Sep 15 '22 at 17:57

score 1 · Answer 1 · answered Sep 15 '22 at 15:54

You can always handle the exception and generate a list of dataframes that you will feed into the concat function. There's probably a way to write this as an oneliner which I don't know. So here's the long version:

os.chdir('file_path')
file_extension = '.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]

all_dataframes = []
for t in all_filenames:
    try:
        df = pd.read_csv(t)
        all_dataframes.append(df)
    except pd.io.common.EmptyDataError:
        print("empty csv encountered")
        # if for some reason you would prefer an empty dataframe
        #df = pd.DataFrame()
        #all_dataframes.append(df)

df = pd.concat(all_dataframes, ignore_index=True,sort=False,axis=0)

Zach Young · Answer 2 · 2022-09-15T18:18:46.490

Following the accepted answer for How to check whether a file is empty or not, we can know if a file is truly empty and just ignore it.

I assume your files look like this:

file1.csv
=========
Col1,Col2
a,1
b,2

file2.csv
=========

file3.csv
=========
Col1,Col2
c,3
d,4

file2.csv is blank, and does not have headers, otherwise Pandas would not throw the "No columns to parse from file" exception.

import glob
import os

import pandas as pd

file_extension = ".csv"

all_filenames = []
for csv_file in glob.glob(f"*{file_extension}"):
    if os.stat(csv_file).st_size == 0:
        continue
    all_filenames.append(csv_file)

df = pd.concat([pd.read_csv(t) for t in all_filenames], ignore_index=True, sort=False, axis=0)

and I get:

  Col1  Col2
0    a     1
1    b     2
2    c     3
3    d     4

How to read in entire folder of csv's and skip items that do not have any columns

2 Answers2