I have been using this post to merge csvs with the same name in different directories.
here is a sample of what my csv files look like
csv A(dir1)
distance,x_u,y_u,u_comp
0,0,.01,.001
1,1,.01,.004
2,2,.03,.002
etc.
csv A(dir2)
distance,x_v,y_v,v_comp
0,0,.01,5
1,1,.01,5.2
2,2,.03,4.98
etc.
What I'm trying to obtain is a csv like this:
distance,x_u,y_u,u_comp,x_v,y_v,v_comp
0,0,.01,.001,0,.01,5
1,1,.01,.004,1,.01,5.2
2,2,.03,.002,2,.03,4.98
Basically, I'm trying to join the csvs by the distance value.
here is the code I'm using:
import glob
import pandas as pd
CONCAT_DIR = "/FILES_CONCAT/"
# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])
# fullpath
# 0 root\dir1\A.csv
# 1 root\dir1\B.csv
# 2 root\dir2\A.csv
# 3 root\dir2\B.csv
# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("/", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
# path filename
# 0 root\dir1 A.csv
# 1 root\dir1 B.csv
# 2 root\dir2 A.csv
# 3 root\dir2 B.csv
# Join these into one DataFrame
files = files.join(files_split)
# fullpath path filename
# 0 root\dir1\A.csv root\dir1 A.csv
# 1 root\dir1\B.csv root\dir1 B.csv
# 2 root\dir2\A.csv root\dir2 A.csv
# 3 root\dir2\B.csv root\dir2 B.csv
# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
concat_df = pd.concat(dfs) # Concat dataframes into one
concat_df.to_csv(CONCAT_DIR + f) # Save dataframe
When I run it, I get this instead:
distance,x_u,y_u,u_comp
0,0,.01,.001
1,1,.01,.004
2,2,.03,.002
.....
distance,x_v,y_v,v_comp
0,0,.01,5
1,1,.01,5.2
2,2,.03,4.98
I believe it has something to do with the concat_df = pd.concat(dfs) line but I'm not sure what to change. I've looked at other examples here but most of them don't involve a loop and I'm not sure to make it work.
Thanks in advance!