merge multiple CSV files continiously

Question

I have 1000 csv files with same columns names. i want to merge them respectively. I am using below code, but it merge all csv files randomly.

files = os.path.join(path_files, "*_a.csv")
files = glob.glob(files)
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

for example it put first 1000_a.csv then 1_a.csv and etc. But i want to merge respectively and then remove first 100 of them.

like this as a dataframe or a single csv file:

1_a.csv, 2_a.csv, 3_a.csv, ..., 1000_a.csv

could you please let me know how it is possible?

@jezrael 847_a.csv, 45_a.csv, 2454_a.csv, 1345_a.csv, and 1266_a.csv — Rasoul Sadeghi, Sep 28 '22 at 08:52

jezrael · Answer 1 · 2022-09-28T09:00:53.767

0

You can sort filenames by integers before _, or remove _a.csv or last 6 characters:

files = os.path.join(path_files, "*_a.csv")

files = sorted(glob.glob(files), key=lambda x: int(x.split('_')[0]))
#alternative1
#files = sorted(glob.glob(files), key=lambda x: int(x.replace('_a.csv','')))
#alternative2
#files = sorted(glob.glob(files), key=lambda x: int(x[:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

edited Sep 28 '22 at 09:00

answered Sep 28 '22 at 08:54

jezrael

822,522
95
1,334
1,252

in my project name there is "_" and it shows error. – Rasoul Sadeghi Sep 28 '22 at 09:04
@RasoulSadeghi - I ask for filenames - `847_a.csv, 45_a.csv, 2454_a.csv, 1345_a.csv, and 1266_a.csv` - it working well. So there is difference? – jezrael Sep 28 '22 at 09:06
ValueError: invalid literal for int() with base 10: '/' – Rasoul Sadeghi Sep 28 '22 at 09:08

clichedmoog · Answer 2 · 2022-09-28T09:37:53.523

0

You shoud re-order glob.glob() results like this:

files_path = os.path.join(base_path, "*_a.csv")
files = sorted(glob.glob(files_path), key=lambda name: int(name[0:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

And there are similar questions about natural sort: Is there a built in function for string natural sort?

edited Sep 28 '22 at 09:37

answered Sep 28 '22 at 08:56

clichedmoog

305
2
7

score 0 · Answer 3 · answered Sep 28 '22 at 08:58

0

i hope it will be usefully for you

import pandas as pd
df = pd.DataFrame()
for csv_file in sorted(list_filenames):
  temp_df = pd.read_csv(scv_file)
  df = pd.concat([df, temp_df])

answered Sep 28 '22 at 08:58

V Z

119
3

score 0 · Answer 4 · answered Sep 28 '22 at 10:14

0

This is an alternative solution.

os.chdir(path_files)
all_filenames = [i for i in sorted(glob.glob('*.{}'.format('csv')))]
df = pd.concat([pd.read_csv(f) for f in all_filenames ]).reset_index()

answered Sep 28 '22 at 10:14

band

129
7

merge multiple CSV files continiously

4 Answers4