0

I have been using this post to merge csvs with the same name in different directories.

here is a sample of what my csv files look like

csv A(dir1)

distance,x_u,y_u,u_comp

0,0,.01,.001

1,1,.01,.004

2,2,.03,.002

etc.

csv A(dir2)

distance,x_v,y_v,v_comp

0,0,.01,5

1,1,.01,5.2

2,2,.03,4.98

etc.

What I'm trying to obtain is a csv like this:

distance,x_u,y_u,u_comp,x_v,y_v,v_comp

0,0,.01,.001,0,.01,5

1,1,.01,.004,1,.01,5.2

2,2,.03,.002,2,.03,4.98

Basically, I'm trying to join the csvs by the distance value.

here is the code I'm using:

import glob
import pandas as pd

CONCAT_DIR = "/FILES_CONCAT/"

# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])

#    fullpath
# 0  root\dir1\A.csv
# 1  root\dir1\B.csv
# 2  root\dir2\A.csv
# 3  root\dir2\B.csv

# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("/", 1, expand=True).rename(columns={0: 'path', 1:'filename'})

#    path       filename
# 0  root\dir1  A.csv
# 1  root\dir1  B.csv
# 2  root\dir2  A.csv
# 3  root\dir2  B.csv

# Join these into one DataFrame
files = files.join(files_split)

#    fullpath         path       filename
# 0  root\dir1\A.csv  root\dir1   A.csv
# 1  root\dir1\B.csv  root\dir1   B.csv
# 2  root\dir2\A.csv  root\dir2   A.csv
# 3  root\dir2\B.csv  root\dir2   B.csv

# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
    paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
    dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
    concat_df = pd.concat(dfs) # Concat dataframes into one
    concat_df.to_csv(CONCAT_DIR + f) # Save dataframe
    

When I run it, I get this instead:

distance,x_u,y_u,u_comp

0,0,.01,.001

1,1,.01,.004

2,2,.03,.002

.....

distance,x_v,y_v,v_comp

0,0,.01,5

1,1,.01,5.2

2,2,.03,4.98

I believe it has something to do with the concat_df = pd.concat(dfs) line but I'm not sure what to change. I've looked at other examples here but most of them don't involve a loop and I'm not sure to make it work.

Thanks in advance!

Community
  • 1
  • 1
foxcad
  • 11
  • 1

0 Answers0