Concatenating multiple dataframes. Issue with datapaths

Question

I want to concatenate several csv files which I saved in a directory ./Errormeasure. In order to do so, I used the following answer from another thread https://stackoverflow.com/a/51118604/9109556

filepaths =[f for f in listdir('./Errormeasure')if f.endswith('.csv')]
df=pd.concat(map(pd.read_csv,filepaths))
print(df)

However, this code only works, when I have the csv files I want to concatentate both in the ./Errormeasure directory as well as in the directory below, ./venv. This however is obviously not convenient. When I have the csv files only in the ./Errormeasure, I recieve the following error:

FileNotFoundError: [Errno 2] File b'errormeasure_871687110001543570.csv' does not exist: b'errormeasure_871687110001543570.csv'

Can you give me some tips to tackle this problem? I am using pycharm. Thanks in advance!

Please include all `import` lines. Likely you need to map the folder path with file names. — Parfait, Jun 04 '19 at 15:18
The csv files are saved in here: `L:\Graduation\Pythonfiles\Errormeasures\venv\Errormeasure`(here I save only the csv files which I want to retrieve.), while the code is situated here: `L:\Graduation\Pythonfiles\Errormeasures\venv` — jonasa, Jun 05 '19 at 08:02

Parfait · Accepted Answer · 2019-06-05T12:56:21.457

Using os.listdir() only retrieves file names and not parent folders which is needed for pandas.read_csv() at relative (where pandas script resides) or absolute levels.

Instead consider the recursive feature of built-in glob (only available in Python 3.5+) to return full paths of all csv files at top level and subfolders.

import glob

for f in glob.glob(dirpath + "/**/*.csv", recursive=True):
    print(f)

From there build data frames in list comprehension (bypassing map -see List comprehension vs map) to be concatenated with pd.concat:

df_files = [pd.read_csv(f) for f in glob.glob(dirpath + "/**/*.csv", recursive=True)]
df = pd.concat(df_files)
print(df)

For Python < 3.5, consider os.walk() + os.listdir() to retrieve full paths of csv files:

import os
import pandas as pd

# COMBINE CSVs IN CURR FOLDER + SUB FOLDERS
fpaths = [os.path.join(dirpath, f) 
            for f in os.listdir(dirpath) if f.endswith('.csv')] + \
         [os.path.join(fdir, fld, f) 
            for fdir, flds, ffile in os.walk(dirpath) 
            for fld in flds  
            for f in os.listdir(os.path.join(fdir, fld)) if f.endswith('.csv')]

df = pd.concat([pd.read_csv(f) in for f in fpaths])
print(df)

Great! That did the job. You just have a typo in your second part of code: It has to be `df_files=[pd.read_csv(f)...` — jonasa, Jun 05 '19 at 08:58
Sounds good. Glad to help. Whoops! R uses `read.csv` and mistakenly I forgot to code-switch for Pandas. — Parfait, Jun 05 '19 at 12:58

Mahsa Hassankashi · Answer 2 · 2019-06-04T13:05:33.080

0

import pandas as pd
import glob

path = r'C:\Directory' # use your path
files = glob.glob(path + "/*.csv")

list = []

for file in files:
    df = pd.read_csv(file, index_col=None, header=0)
    list.append(df)

frame = pd.concat(list, axis=0, ignore_index=True)

Maybe you need to use '\' instead of '/'

file = glob.glob(os.path.join(your\\path , '.csv'))
print(file)

You can run above codes on for loop.

edited Jun 04 '19 at 13:05

answered Jun 04 '19 at 12:25

Mahsa Hassankashi

2,086
1
15
25

Thanks for your answer. However, I get the error `ValueError: No objects to concatenate` now. Do you know a work around for that? – jonasa Jun 04 '19 at 12:35
@jonasa Do you have duplicate column names? If yes, getting rid of those duplicates will solve your problem. – Mahsa Hassankashi Jun 04 '19 at 12:46
@jonasa if the answer helped you, please accept it. – Mahsa Hassankashi Jun 04 '19 at 12:48
Not within each dataframe. However, the different df's all have the same column names (and I can't change those) – jonasa Jun 04 '19 at 12:49
Change that because they are merging together, or try to remove them implicitly by np.delete(your first header) – Mahsa Hassankashi Jun 04 '19 at 12:51
Okay. But I already achieved this with my code I provided above. This already worked but only raised an error, when I saved the files in another map `./Errormeasure`. So I only have to sort out the filepath issue I assume... – jonasa Jun 04 '19 at 12:57

Concatenating multiple dataframes. Issue with datapaths

2 Answers2