-1

I am a beginner in python. I have a hundred pair of CSV file. The file looks like this:

25_13oct_speed_0.csv    
26_13oct_speed_0.csv
25_13oct_speed_0.1.csv
26_13oct_speed_0.1.csv
25_13oct_speed_0.2.csv
26_13oct_speed_0.2.csv
and others

I want to concatenate the pair files between 25 and 26 file. each pair of the file has a speed threshold (Speed_0, 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0) which is labeled on the file name. These files have the same structure data.

Mac Annotation  X   Y
A   first       0   0
A   last        0   0
B   first       0   0
B   last        0   0

Therefore, concatenate analyze is enough to join these two data. I use this method:

df1 = pd.read_csv('25_13oct_speed_0.csv')
df2 = pd.read_csv('26_13oct_speed_0.csv')
frames = [df1, df2]
result = pd.concat(frames)

for each pair files. but it takes time and not an elegant way. is there a good way to combine automatically the pair file and save simultaneously?

Arief Hidayat
  • 937
  • 1
  • 8
  • 19
  • 5
    Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – PV8 Jun 21 '19 at 12:54
  • 1
    Conceptually, you may be able to extract the key elements you are trying to match by (`25/26` and `0/0.1/0.2/etc.`), create a key based on those, sort the .csv names by that key, and then iterate over those to read in and concatenate like pairs of dataframes. – Brendan Jun 21 '19 at 15:59
  • thank you @BrendanCox, I will try to make a key based on the file name – Arief Hidayat Jun 22 '19 at 02:44

1 Answers1

1

Idea is create DataFrame by list of files and add 2 new columns by Series.str.split by first _:

print (files)
['25_13oct_speed_0.csv', '26_13oct_speed_0.csv', 
 '25_13oct_speed_0.1.csv', '26_13oct_speed_0.1.csv', 
 '25_13oct_speed_0.2.csv', '26_13oct_speed_0.2.csv']

df1 = pd.DataFrame({'files': files})
df1[['g','names']] = df1['files'].str.split('_', n=1, expand=True)
print (df1)
                    files   g                names
0    25_13oct_speed_0.csv  25    13oct_speed_0.csv
1    26_13oct_speed_0.csv  26    13oct_speed_0.csv
2  25_13oct_speed_0.1.csv  25  13oct_speed_0.1.csv
3  26_13oct_speed_0.1.csv  26  13oct_speed_0.1.csv
4  25_13oct_speed_0.2.csv  25  13oct_speed_0.2.csv
5  26_13oct_speed_0.2.csv  26  13oct_speed_0.2.csv

Then loop per groups per column names, loop by groups with DataFrame.itertuples and create new DataFrame with read_csv, if necessary add new column filled by values from g, append to list, concat and last cave to new file by name from column names:

for i, g in df1.groupby('names'):
    out = []
    for n in g.itertuples():
        df = pd.read_csv(n.files).assign(source=n.g)
        out.append(df)
    dfbig = pd.concat(out, ignore_index=True)
    print (dfbig)
    dfbig.to_csv(g['names'].iat[0])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252