4

I have two data frames with exactly the same index:

the first one:
           0         1         2
2   0.011765  0.490196  0.015686
2   0.011765  0.490196  0.015686
2   0.007843  0.494118  0.007843
2   0.007843  0.494118  0.007843
2   0.007843  0.501961  0.011765
..       ...       ...       ...

0   0.000000  0.031373  0.039216
0   0.031373  0.082353  0.105882
0   0.094118  0.149020  0.192157
0   0.094118  0.156863  0.215686

[337962 rows x 3 columns]

and the second one:

          0         1         2
0  0.055852  0.118138  0.052386
1  0.453661  0.665857  0.441551
2  0.096394  0.635641  0.068524
3  0.952545  0.827438  0.047632
4  0.787729  0.823494  0.795792
5  0.050284  0.549379  0.592593
6  0.608805  0.215458  0.068293
7  0.775640  0.091352  0.689224

The first DF is quite huge. I need to replace values in huge DF by values with same index in small DF as quickly as possible. How? Thanks for any help.

2 Answers2

5

Use the index of the second dataframe to slice the first one and then assign.

df1.loc[df2.index] = df2
Stop harming Monica
  • 12,141
  • 1
  • 36
  • 56
2

You can use merge empty dataframe df1 with df2 by indexes:

print pd.merge(df1[[]], df2, left_index=True, right_index=True)
          0         1         2
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524

Or join:

print df1[[]].join(df2)
          0         1         2
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524

If you need preserved index ordering use merge with reset_index, merge on column index and then set_index:

df = pd.merge(df1[[]].reset_index(), df2.reset_index(), on='index').set_index('index')
df.index.name = None 
print df

          0         1         2
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
2  0.096394  0.635641  0.068524
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
0  0.055852  0.118138  0.052386
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252