0

I have a list of .csv files stored in a local folder and I'm trying to concatenate them into one single dataframe.

Here is the code I'm using :

import pandas as pd
import os

folder = r'C:\Users\_M92\Desktop\myFolder'

df = pd.concat([pd.read_csv(os.path.join(folder, f), delimiter=';') for f in os.listdir(folder)])
display(df)

Only one problem, it happens that one of the files is sometimes empty (0 cols, 0 rows) and in this case, pandas is throwing an EmptyDataError: No columns to parse from file in line 6.

Do you have any suggestions how to bypass the empty csv file ?
And why not how to concatenate csv files in a more efficient/simplest way.

Ideally, I would also like to add a column (to the dataframe df) to carry the name of each .csv.

ASGM
  • 11,051
  • 1
  • 32
  • 53
Timeless
  • 22,580
  • 4
  • 12
  • 30

2 Answers2

3

You can check if a file is empty with:

import os

os.stat(FILE_PATH).st_size == 0

In your use case:

import os

df = pd.concat([
    pd.read_csv(os.path.join(folder, f), delimiter=';') \
    for f in os.listdir(folder) \
    if os.stat(os.path.join(folder, f)).st_size != 0
])
ASGM
  • 11,051
  • 1
  • 32
  • 53
  • Hi @ASGM, Unfortunately, I'm still getting the same `EmptyDataError: No columns to parse from file`. – Timeless Aug 14 '22 at 14:45
1

Personally I would filter the files for content first, then merge them using the basic try-except.

import pandas as pd
import os

folder = r'C:\Users\_M92\Desktop\myFolder'
data = []

for f in os.listdir(folder):
   try:
      temp = pd.read_csv(os.path.join(folder, f), delimiter=';')
      # adding original filename column as per request
      temp['origin'] = f
      data.append(temp)
   except pd.errors.EmptyDataError:
      continue

df = pd.concat(data)

display(df)
  • Hi @Wonhyeong Seo, your code works fine. Thank you so much. Did you see the second part of my post ? I would like to add a column to `df` with the name of each .csv file. Is there a way to make that happen ? – Timeless Aug 14 '22 at 14:48
  • @L'Artiste Hello! Glad that it worked! I added the column part just now! May you please check in with the `temp['origin'] = f`? – Wonhyeong Seo Aug 14 '22 at 14:50
  • You're welcome! Thank you for your interesting question! I had a lot of fun researching it up :D – Wonhyeong Seo Aug 14 '22 at 14:53