How to fill in data in dataframe with keeping the existing values

Question

I have script to fill in the values from a file (df4) to an existing dataframe (df3). But dataframe df3 contains already columns filled with values and those existing values are set to "NaN" with the following script:

df5 = df4.pivot_table(index='source', columns='plasmidgene', values='identity').reindex(index=df3.index, columns=df3.columns)

How can I avoid that my existing values are overwritten? Thanks

For example, I have df1

   a   b   c    d   e   f
1  1   30  Nan Nan Nan Nan
2  2   3   Nan Nan Nan Nan
3  16  1   Nan Nan Nan Nan

df2

 1   1  d   80
 2   2  e   90
 3   3  c   60

And I want to create this

   a   b   c   d   e   f
1  1  30  Nan 80  Nan Nan
2  2   3  Nan Nan 90  Nan
3 16   1  60  Nan Nan Nan

See: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — languitar, Apr 06 '17 at 08:36

jezrael · Accepted Answer · 2017-04-06T08:55:35.520

0

I think you can use combine_first:

 df = df2.pivot_table(index='source', columns='plasmidgene', values='identity') \
        .reindex(index=df1.index, columns= df1.columns) \
        .combine_first(df1)

print (df)
      a     b     c     d     e   f
1   1.0  30.0   NaN  80.0   NaN NaN
2   2.0   3.0   NaN   NaN  90.0 NaN
3  16.0   1.0  60.0   NaN   NaN NaN

print (df.dtypes)
a    float64
b    float64
c    float64
d    float64
e    float64
f    float64
dtype: object

For fillna it is problematic - does not change dtypes to float64, so DONT USE it - it looks like bug:

df = df2.pivot_table(index='source', columns='plasmidgene', values='identity') \
        .reindex(index=df1.index, columns= df1.columns) \
        .fillna(df1)

print (df)
    a   b    c    d    e    f
1   1  30  NaN   80  NaN  NaN
2   2   3  NaN  NaN   90  NaN
3  16   1   60  NaN  NaN  NaN

print (df.dtypes)
a    object
b    object
c    object
d    object
e    object
f    object
dtype: object

edited Apr 06 '17 at 08:55

answered Apr 06 '17 at 08:39

jezrael

822,522
95
1,334
1,252

Yes, the last option works great! Thank you so much! – Gravel Apr 06 '17 at 08:46
In my opinion, better is use `combine_first`, because mixed types are problematic - some pandas function are buggy. – jezrael Apr 06 '17 at 08:52
If I use combine_first, I get the following error [AttributeError: 'DataFrame' object has no attribute 'dtype'] – Gravel Apr 06 '17 at 13:47
maybe typo `print (df.dtypes)` add `s` - It is only for check. – jezrael Apr 06 '17 at 13:48
all dtypes are object? – Gravel Apr 06 '17 at 14:37
If numeric values need `int` or `float`, if strings need `object`, what are obviolusly strings. – jezrael Apr 06 '17 at 14:38

How to fill in data in dataframe with keeping the existing values

1 Answers1