Function that first checks for NaN, then replaces values based on another column

Question

I am trying to define a function that replaces the value in 'BsmtFinType1' based on the value of the column 'MSSubClass'. However, I want it to first check to make sure the 'value to be replaced' is NaN. If it has a value, it does not replace it. I am not entirely sure why my sample won't work. After running it, none of the values update. If I remove the first 'If' Statement, it runs and replaces values just fine. Any ideas?

# Fill Basement Finishes
def fills_na(x):
    if x['BsmtFinType1'] != np.NaN:
        return x['BsmtFinType1']
    elif x['MSSubClass'] < 60:
        return 'Unf'
    else:
        return 'GLQ'

all_data['BsmtFinType1'] = all_data.apply(lambda x: bsmt_fin(x), axis=1)

https://stackoverflow.com/questions/30357276/pandas-fillna-with-another-column — mooglinux, Aug 13 '18 at 01:42
What do you mean by "it won't work." Does an error get thrown, or does it simply not modify the values? If it doesn't modify the values then my guess would be that x['BsmtFinType1'] is never equivalent to np.NaN, meaning there's probably something wonky going on there, check for equivalency directly to make sure that != should actually work. — FredMan, Aug 13 '18 at 01:45

score 0 · Answer 1 · answered Aug 13 '18 at 01:55

I ended up modifying the function and got it to work. I instead used a nested if rather than making != np.NaN its own if.

# Fill Basement Finishes
def fills_na(cols):
    fin = cols[0]
    subclass = cols[1]
    if pd.isnull(fin):
        if subclass < 60:
            return 'Unf'
        else:
            return 'GLQ'
    else:
        return fin

all_data['BsmtFinType1'] = all_data[['BsmtFinType1','MSSubClass']].apply(fills_na, axis = 1)

score 0 · Answer 2 · edited Aug 13 '18 at 02:05

0

Okay, I opened up a python interactive environment and found that you need to use ~np.isnan(x['BsmtFinType1']) instead of !=.

The cause of this is because the DF represents NaN slightly differently than np does by default cause they are different types.

>>> type(np.NaN) 
class 'float'

and

>>> type(df['one']['d']) 
class 'numpy.float64'

edited Aug 13 '18 at 02:05

Lev Zakharov

2,409
1
10
24

answered Aug 13 '18 at 01:59

FredMan

861
7
19

Function that first checks for NaN, then replaces values based on another column

2 Answers2