0

I am trying to define a function that replaces the value in 'BsmtFinType1' based on the value of the column 'MSSubClass'. However, I want it to first check to make sure the 'value to be replaced' is NaN. If it has a value, it does not replace it. I am not entirely sure why my sample won't work. After running it, none of the values update. If I remove the first 'If' Statement, it runs and replaces values just fine. Any ideas?

# Fill Basement Finishes
def fills_na(x):
    if x['BsmtFinType1'] != np.NaN:
        return x['BsmtFinType1']
    elif x['MSSubClass'] < 60:
        return 'Unf'
    else:
        return 'GLQ'

all_data['BsmtFinType1'] = all_data.apply(lambda x: bsmt_fin(x), axis=1)
Alex
  • 79
  • 12
  • Is `all_data` a numpy array or a python array? – mooglinux Aug 13 '18 at 01:39
  • it is a pandas dataframe – Alex Aug 13 '18 at 01:40
  • https://stackoverflow.com/questions/30357276/pandas-fillna-with-another-column – mooglinux Aug 13 '18 at 01:42
  • What do you mean by "it won't work." Does an error get thrown, or does it simply not modify the values? If it doesn't modify the values then my guess would be that x['BsmtFinType1'] is never equivalent to np.NaN, meaning there's probably something wonky going on there, check for equivalency directly to make sure that != should actually work. – FredMan Aug 13 '18 at 01:45

2 Answers2

0

I ended up modifying the function and got it to work. I instead used a nested if rather than making != np.NaN its own if.

# Fill Basement Finishes
def fills_na(cols):
    fin = cols[0]
    subclass = cols[1]
    if pd.isnull(fin):
        if subclass < 60:
            return 'Unf'
        else:
            return 'GLQ'
    else:
        return fin

all_data['BsmtFinType1'] = all_data[['BsmtFinType1','MSSubClass']].apply(fills_na, axis = 1)
Alex
  • 79
  • 12
0

Okay, I opened up a python interactive environment and found that you need to use ~np.isnan(x['BsmtFinType1']) instead of !=.

The cause of this is because the DF represents NaN slightly differently than np does by default cause they are different types.

>>> type(np.NaN) 
class 'float'

and

>>> type(df['one']['d']) 
class 'numpy.float64'
Lev Zakharov
  • 2,409
  • 1
  • 10
  • 24
FredMan
  • 861
  • 7
  • 19