6

I have a column with na values that I want to fill according to values from another data frame according to a key. I was wondering if there is any simple way to do so.

Example: I have a data frame of objects and their colors like this:

  object   color
0  chair   black
1   ball  yellow
2   door   brown
3   ball     **NaN**
4  chair   white
5  chair     **NaN**
6   ball    grey

I want to fill na values in the color column with default color from the following data frame:

  object default_color
0  chair         brown
1   ball          blue
2   door          grey

So the result will be this:

  object   color
0  chair   black
1   ball  yellow
2   door   brown
3   ball     **blue**
4  chair   white
5  chair     **brown**
6   ball    grey

Is there any "easy" way to do this?

Thanks :)

cs95
  • 379,657
  • 97
  • 704
  • 746
Kuzenbo
  • 229
  • 4
  • 9

3 Answers3

9

Use np.where and mapping by setting a column as index i.e

df['color']= np.where(df['color'].isnull(),df['object'].map(df2.set_index('object')['default_color']),df['color'])

or df.where

df['color'] = df['color'].where(df['color'].notnull(), df['object'].map(df2.set_index('object')['default_color'])) 
 object   color
0  chair   black
1   ball  yellow
2   door   brown
3   ball    blue
4  chair   white
5  chair   brown
6   ball    grey
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
7

First create Series and then replace NaNs:

s = df1['object'].map(df2.set_index('object')['default_color'])
print (s)
0    brown
1     blue
2     grey
3     blue
4    brown
5    brown
6     blue
Name: object, dtype: object
df1['color']= df1['color'].mask(df1['color'].isnull(), s)

Or:

df1.loc[df1['color'].isnull(), 'color'] = s

Or:

df1['color'] = df1['color'].combine_first(s)

Or:

df1['color'] = df1['color'].fillna(s)

print (df1)
  object   color
0  chair   black
1   ball  yellow
2   door   brown
3   ball    blue
4  chair   white
5  chair   brown
6   ball    grey

If unique values in object:

df = df1.set_index('object')['color']
        .combine_first(df2.set_index('object')['default_color'])
        .reset_index()

Or:

df = df1.set_index('object')['color']
        .fillna(df2.set_index('object')['default_color'])
        .reset_index()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
5

Using loc + map:

m = df.color.isnull()
df.loc[m, 'color'] = df.loc[m, 'object'].map(df2.set_index('object').default_color)

df

  object   color
0  chair   black
1   ball  yellow
2   door   brown
3   ball    blue
4  chair   white
5  chair   brown
6   ball    grey

If you're going to be doing a lot of these replacements, you should call set_index on df2 just once and save its result.

cs95
  • 379,657
  • 97
  • 704
  • 746