0

I have 1000 csv files with same columns names. i want to merge them respectively. I am using below code, but it merge all csv files randomly.

files = os.path.join(path_files, "*_a.csv")
files = glob.glob(files)
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

for example it put first 1000_a.csv then 1_a.csv and etc. But i want to merge respectively and then remove first 100 of them.

like this as a dataframe or a single csv file:

1_a.csv, 2_a.csv, 3_a.csv, ..., 1000_a.csv

could you please let me know how it is possible?

4 Answers4

0

You can sort filenames by integers before _, or remove _a.csv or last 6 characters:

files = os.path.join(path_files, "*_a.csv")

files = sorted(glob.glob(files), key=lambda x: int(x.split('_')[0]))
#alternative1
#files = sorted(glob.glob(files), key=lambda x: int(x.replace('_a.csv','')))
#alternative2
#files = sorted(glob.glob(files), key=lambda x: int(x[:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You shoud re-order glob.glob() results like this:

files_path = os.path.join(base_path, "*_a.csv")
files = sorted(glob.glob(files_path), key=lambda name: int(name[0:-6]))
df = pd.concat(map(pd.read_csv, files), ignore_index=True)

And there are similar questions about natural sort: Is there a built in function for string natural sort?

clichedmoog
  • 305
  • 2
  • 7
0

i hope it will be usefully for you

import pandas as pd
df = pd.DataFrame()
for csv_file in sorted(list_filenames):
  temp_df = pd.read_csv(scv_file)
  df = pd.concat([df, temp_df]) 
V Z
  • 119
  • 3
0

This is an alternative solution.

os.chdir(path_files)
all_filenames = [i for i in sorted(glob.glob('*.{}'.format('csv')))]
df = pd.concat([pd.read_csv(f) for f in all_filenames ]).reset_index()
band
  • 129
  • 7