How to pass another entire column as argument to pandas fillna()

Question

I would like to fill missing values in one column with values from another column, using fillna method.

(I read that looping through each row would be very bad practice and that it would be better to do everything in one go but I could not find out how to do it with fillna.)

Data before:

Day  Cat1  Cat2
1    cat   mouse
2    dog   elephant
3    cat   giraf
4    NaN   ant

Data after:

Day  Cat1  Cat2
1    cat   mouse
2    dog   elephant
3    cat   giraf
4    ant   ant

joris · Accepted Answer · 2015-05-20T18:16:55.510

255

You can provide this column to fillna (see docs), it will use those values on matching indexes to fill:

In [17]: df['Cat1'].fillna(df['Cat2'])
Out[17]:
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

edited May 20 '15 at 18:16

answered May 20 '15 at 18:14

joris

133,120
36
247
202

11

Nice! I did not know that ``fillna`` takes a series. – Ami Tavory May 20 '15 at 18:15
2

Thanks! I thought the serie had to be the exact size of the number of NA values. – xav May 20 '15 at 21:04
1

It works also for dataframes for multicolumnar rows. That feature of fillna is very helpful. – Wertikal May 30 '19 at 09:14

score 30 · Answer 2 · edited May 23 '17 at 10:31

30

You could do

df.Cat1 = np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)

The overall construct on the RHS uses the ternary pattern from the pandas cookbook (which it pays to read in any case). It's a vector version of a? b: c.

edited May 23 '17 at 10:31

Community

1
1

answered May 20 '15 at 18:12

Ami Tavory

74,578
11
141
185

1

Not the solution I used for this problem but very interesting pattern! Thanks! – xav May 20 '15 at 21:04
is there a way of using this for multiple columns? e.g. if this df had cat1, cat2, cat3, cat4,cat5 and let's say cat5 was empty. would there be a way to fill cat5 with values from cat1 if cat1 empty then cat2, if cat2 empty then cat3 etc.? – user8322222 Feb 14 '19 at 12:36
@user8322222 I am definitely late but if anybody is having this question, you can do nested np.where, just like you would do in excel cell = np.where(cond, val_true, np.where(cond, val_true, val_false),). – Pab May 23 '19 at 19:06
You want to mention that this is just redefining the pandas builtin `pd.DataFrame.fillna()`. And I suspect the corner-case behavior may differ e.g. for mismatched series lengths from different dataframes: dfA['Cat1'], dfB['Cat2'] – smci Jul 08 '19 at 02:08

score 15 · Answer 3 · answered May 20 '15 at 18:14

15

Just use the value parameter instead of method:

In [20]: df
Out[20]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  NaN       ant    4

In [21]: df.Cat1 = df.Cat1.fillna(value=df.Cat2)

In [22]: df
Out[22]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  ant       ant    4

answered May 20 '15 at 18:14

chrisaycock

36,470
14
88
125

1

Thanks for the answer! What does it change to use value rather than the method described by joris? – xav May 20 '15 at 21:06
@xav `value` is the first parameter, so joris is actually doing the exact same thing. As he said, see the [docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.fillna.html). – chrisaycock May 20 '15 at 22:02
Yes, the docstring is a bit misleading as `method` is listed there first. – joris May 21 '15 at 06:27

score 10 · Answer 4 · answered Dec 04 '18 at 08:17

pandas.DataFrame.combine_first also works.

(Attention: since "Result index columns will be the union of the respective indexes and columns", you should check the index and columns are matched.)

import numpy as np
import pandas as pd
df = pd.DataFrame([["1","cat","mouse"],
    ["2","dog","elephant"],
    ["3","cat","giraf"],
    ["4",np.nan,"ant"]],columns=["Day","Cat1","Cat2"])

In: df["Cat1"].combine_first(df["Cat2"])
Out: 
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

Compare with other answers:

%timeit df["Cat1"].combine_first(df["Cat2"])
181 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit df['Cat1'].fillna(df['Cat2'])
253 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)
88.1 µs ± 793 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I didn't use this method below:

def is_missing(Cat1,Cat2):    
    if np.isnan(Cat1):        
        return Cat2
    else:
        return Cat1

df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)

because it will raise an Exception:

TypeError: ("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", 'occurred at index 0')

which means np.isnan can be applied to NumPy arrays of native dtype (such as np.float64), but raises TypeError when applied to object arrays.

So I revise the method:

def is_missing(Cat1,Cat2):    
    if pd.isnull(Cat1):        
        return Cat2
    else:
        return Cat1

%timeit df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)
701 µs ± 7.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

score 1 · Answer 5 · answered Jun 17 '22 at 18:00

1

Alternatively you can also use mask:

In [8]: df.assign(Cat1=df['Cat1'].mask(df['Cat1'].isna(), df['Cat2']))
Out[8]: 
   Day Cat1      Cat2
0    1  cat     mouse
1    2  dog  elephant
2    3  cat     giraf
3    4  ant       ant

answered Jun 17 '22 at 18:00

rachwa

1,805
1
14
17

score 0 · Answer 6 · answered Jul 18 '18 at 19:20

0

Here is a more general approach (fillna method is probably better)

def is_missing(Cat1,Cat2):    
    if np.isnan(Cat1):        
        return Cat2
    else:
        return Cat1

df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)

answered Jul 18 '18 at 19:20

sparrow

10,794
12
54
74

score 0 · Answer 7 · answered Jun 04 '20 at 20:59

I know this is an old question, but I had a need for doing something similar recently. I was able to use the following:

df = pd.DataFrame([["1","cat","mouse"],
    ["2","dog","elephant"],
    ["3","cat","giraf"],
    ["4",np.nan,"ant"]],columns=["Day","Cat1","Cat2"])

print(df)

  Day Cat1      Cat2
0   1  cat     mouse
1   2  dog  elephant
2   3  cat     giraf
3   4  NaN       ant

df1 = df.bfill(axis=1).iloc[:, 1]
df1 = df1.to_frame()
print(df1)

Which yields:

  Cat1
0  cat
1  dog
2  cat
3  ant

Hope this is helpful to someone!

How to pass another entire column as argument to pandas fillna()

7 Answers7

Linked

Related