2 dataframes, same number of columns, different number of rows comparing and replacing values

Question

i think this is an easy question and I know where to look, using merge, join, loc, iloc or 1 of these functions but did not figure it out yet. Here a simplistic example what I want to do. df1 and df2 have the same columns but a different number of rows. Now I want to find rows where the column "t1" is the same for both dataframes and then replace the values in column "c1" of df1 with the values of column "c1" of df2 (so where their t1 value are the same). I also tried functions where and replace but I am pretty sure I need merge or join. Thank you.

import pandas as pd
df1 = pd.DataFrame()
df2 = pd.DataFrame()

# close price
df1.at[0,"c1"]=0
df1.at[1,"c1"]=0
df1.at[2,"c1"]=0
df1.at[3,"c1"]=0
df1.at[4,"c1"]=0
df1.at[5,"c1"]=0
df1.at[6,"c1"]=0
df1.at[7,"c1"]=0

df2.at[0,"c1"]=20
df2.at[1,"c1"]=26
df2.at[3,"c1"]=23
df2.at[4,"c1"]=21

# time stamp
df1.at[0,"t1"]=3
df1.at[1,"t1"]=4
df1.at[2,"t1"]=5
df1.at[3,"t1"]=6
df1.at[4,"t1"]=7
df1.at[5,"t1"]=8
df1.at[6,"t1"]=9
df1.at[7,"t1"]=10

df2.at[0,"t1"]=5
df2.at[1,"t1"]=6
df2.at[3,"t1"]=7
df2.at[4,"t1"]=8

They look like:

>>> df1
    c1    t1
0  0.0   3.0
1  0.0   4.0
2  0.0   5.0
3  0.0   6.0
4  0.0   7.0
5  0.0   8.0
6  0.0   9.0
7  0.0  10.0
>>> df2
     c1   t1
0  20.0  5.0
1  26.0  6.0
3  23.0  7.0
4  21.0  8.0

So I want df1 to look like the frame shown below. At the rows where the value for "t1" is the same for both df1 and df2 I want to replace the values in column "c1" in df1 with the values from df2.

>>> df1
     c1    t1
0   0.0   3.0
1   0.0   4.0
2  20.0   5.0
3  26.0   6.0
4  23.0   7.0
5  21.0   8.0
6   0.0   9.0
7   0.0  10.0

score 2 · Accepted Answer · answered Oct 10 '22 at 09:40

2

You can use pd.merge for this:

df1 = df1.merge(df2, on=['t1'], how='left')

Which results in:

   c1_x    t1  c1_y
0   0.0   3.0   NaN
1   0.0   4.0   NaN
2   0.0   5.0  20.0
3   0.0   6.0  26.0
4   0.0   7.0  23.0
5   0.0   8.0  21.0
6   0.0   9.0   NaN
7   0.0  10.0   NaN

It adds a new column c1_y which are the merged values from df2. To create the desired output we only need to do the following:

df1['c1'] = df1.c1_y.fillna(df1.c1_x)
df1 = df1[['c1', 't1']]

Output:

     c1    t1
0   0.0   3.0
1   0.0   4.0
2  20.0   5.0
3  26.0   6.0
4  23.0   7.0
5  21.0   8.0
6   0.0   9.0
7   0.0  10.0

answered Oct 10 '22 at 09:40

T C Molenaar

3,205
1
10
26

i wanted to post a reply with an example but seems I can only add small comments. But it works even better than I hoped for since I was making this "dummy" column "c1" in df1 filled with zeros but this is not necessary. Too bad I can not add my expanded example. I tested on my real data en it works great – dutchy12345 Oct 10 '22 at 13:09
Indeed, without the dummy column it works more elegant ;) – T C Molenaar Oct 10 '22 at 13:22

score 1 · Answer 2 · answered Oct 10 '22 at 09:53

Simple use merge:

res = pd.merge(df1, df2, on='t1', how='outer')
df1['t1'] = res['c1_y'].fillna(df1['t1'])

print(df1)

###output:
###    c1    t1
###0  0.0   3.0
###1  0.0   4.0
###2  0.0  20.0
###3  0.0  26.0
###4  0.0  23.0
###5  0.0  21.0
###6  0.0   9.0
###7  0.0  10.0

2 dataframes, same number of columns, different number of rows comparing and replacing values

2 Answers2