4

I'm newbie to pandas, and trying to replace a column value (NaN) in df1 with df2 with column value match. And facing the following error.

df1
unique_col  |  Measure
944537          NaN
7811403         NaN 
8901242114307     1 

df2
unique_col  |  Measure
944537           18
7811403          12 
8901242114307    17.5



df1.loc[(df1.unique_col.isin(df2.unique_col) &
                       df1.Measure.isnull()), ['Measure']] = df2[['Measure']]

I have a two dataframes with 3 million records and on performing below operation facing the following error:

ValueError: cannot reindex from a duplicate axis

James Z
  • 12,209
  • 10
  • 24
  • 44
kashyap
  • 498
  • 1
  • 6
  • 21

1 Answers1

19

You way to easily fill nans is to use fillna function. In your case, if you have the dfs as (notice the indexes)

    unique_col      Measure
0   944537          NaN
1   7811403         NaN
2   8901242114307   1.0


    unique_col      Measure
0   944537          18.0
1   7811403         12.0
2   8901242114307   17.5

You can simply

>>> df.fillna(df2)


    unique_col       Measure
0   944537           18.0
1   7811403          12.0
2   8901242114307    1.0

If indexes are not the same as the above, you can set them to be the same and use the same function

df = df.set_index('unique_col')
df.fillna(df2.set_index('unique_col'))
rafaelc
  • 57,686
  • 15
  • 58
  • 82