1

I have script to fill in the values from a file (df4) to an existing dataframe (df3). But dataframe df3 contains already columns filled with values and those existing values are set to "NaN" with the following script:

df5 = df4.pivot_table(index='source', columns='plasmidgene', values='identity').reindex(index=df3.index, columns=df3.columns)

How can I avoid that my existing values are overwritten? Thanks

For example, I have df1

   a   b   c    d   e   f
1  1   30  Nan Nan Nan Nan
2  2   3   Nan Nan Nan Nan
3  16  1   Nan Nan Nan Nan

df2

 1   1  d   80
 2   2  e   90
 3   3  c   60

And I want to create this

   a   b   c   d   e   f
1  1  30  Nan 80  Nan Nan
2  2   3  Nan Nan 90  Nan
3 16   1  60  Nan Nan Nan
Gravel
  • 365
  • 1
  • 5
  • 19

1 Answers1

0

I think you can use combine_first:

 df = df2.pivot_table(index='source', columns='plasmidgene', values='identity') \
        .reindex(index=df1.index, columns= df1.columns) \
        .combine_first(df1)

print (df)
      a     b     c     d     e   f
1   1.0  30.0   NaN  80.0   NaN NaN
2   2.0   3.0   NaN   NaN  90.0 NaN
3  16.0   1.0  60.0   NaN   NaN NaN

print (df.dtypes)
a    float64
b    float64
c    float64
d    float64
e    float64
f    float64
dtype: object

For fillna it is problematic - does not change dtypes to float64, so DONT USE it - it looks like bug:

df = df2.pivot_table(index='source', columns='plasmidgene', values='identity') \
        .reindex(index=df1.index, columns= df1.columns) \
        .fillna(df1)

print (df)
    a   b    c    d    e    f
1   1  30  NaN   80  NaN  NaN
2   2   3  NaN  NaN   90  NaN
3  16   1   60  NaN  NaN  NaN

print (df.dtypes)
a    object
b    object
c    object
d    object
e    object
f    object
dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252