Merging multiple panda frames

Question

I have a bunch of CSV files which contain data for a specific time and time is encoded as the filename:

time1.csv
Label val1 val2
a      5    6
b.     6    4

time2.csv
Label val1 val2
a      5    6
c      6    4

...

I can read each file into a Pandas data frame. Then I want to: Add the "time" column into each data frame and then merge all the data frames into a single data frame.

Is there a way to accomplish this?

What value the time column should contain? You should add the expected result to your question, like this its hard to say what you really want. — Valentino, Oct 10 '20 at 22:53

score 1 · Answer 1 · answered Oct 10 '20 at 23:09

Given the list of filenames, you can assign a time column to equal the filename (excluding the last four characters, i.e. .csv) and concatenate the result.

df = pd.concat([pd.read_csv(filename).assign(time=filename[:-4])
                for filename in filenames])

Mehdi Golzadeh · Answer 2 · 2020-10-10T23:04:51.373

0

You can create a list of your dfs times then iterate over it and append them to finaldf

finaldf =pandas.DataFrame()
for df in times:
    finaldf = finaldf.append(df.assign(time= timename),ignore_index=True)

print(finaldf)

to do it more automatically you can put filesname in an array and read it in the loap as well like this:

finaldf =pandas.DataFrame()
times = ['time1','time2']
for csv in times:
    df = pd.read_csv('{}.csv'.format(csv))
    finaldf = finaldf.append(df.assign(time= csv),ignore_index=True)
    
print(finaldf)

the result would be :

Label val1 val2   time
a      5    6     time1
b      6    4     time1
a      5    6     time2
c      6    4     time2

edited Oct 10 '20 at 23:04

answered Oct 10 '20 at 22:55

Mehdi Golzadeh

2,594
1
16
28

Isnt the request to merge pandas or concat pandas? – Joe Ferndz Oct 10 '20 at 23:02
I added the final output of my code. You can also use "concat". merge used to merge two dfs on some columns, not concat by columns or by rows – Mehdi Golzadeh Oct 10 '20 at 23:07
This solution is known as a quadratic copy and grows exponentially slower: https://stackoverflow.com/questions/36489576/why-does-concatenation-of-dataframes-get-exponentially-slower – Alexander Oct 10 '20 at 23:15
@Alexander I dont think for such number of files it makes it slower. it may get slower for loops bigger than 10000. But your solution works fine as well. – Mehdi Golzadeh Oct 10 '20 at 23:21
Create 100 files and time the results between the two methods. It will become clear why one should avoid appending to dataframes within a loop. – Alexander Oct 10 '20 at 23:24
@k2k2e6 you can use. finaldf = pd.concat([finaldf,df]) in the povided code – Mehdi Golzadeh Oct 10 '20 at 23:25
No, that won’t make it any faster. I suggest you read the link I provided. – Alexander Oct 11 '20 at 00:15

Merging multiple panda frames

2 Answers2