0

I am trying to fill missing values from a slice of a single column of a dataframe. The reason is, three values in the column are NaN due to actual missing data. The other 1400 or so missing values are actually missing because the homes didn't have pools. For the first case I want to fill the data with the median value. For the latter case, I want to encode the missing data with 'NA', which is the appropriate value for a home with no pool.

My code looks like this, and does not work (no errors or warnings, just no results):

test_df.loc[test_df.PoolQC.isna() & (test_df.PoolArea == 0), ['PoolQC']].fillna('NA', inplace=True)
test_df.loc[test_df.PoolQC.isna() & (test_df.PoolArea > 0), ['PoolQC']].fillna(mode, inplace=True)

However, the following code works:

test_df.loc[test_df.PoolQC.isna() & (test_df.PoolArea == 0), ['PoolQC']] = 'NA'
test_df.loc[test_df.PoolQC.isna() & (test_df.PoolArea > 0), ['PoolQC']] = mode

I can't find any reason why this is happening in the documentation. I don't particularly mind using the work-around as it's actually shorter, but I'm curious as to why it's happening and what best practices are in cases like this?

rocksNwaves
  • 5,331
  • 4
  • 38
  • 77
  • When you say the first option does not work - it does no change? returns an error? makes only some of the changes? – Roim May 11 '20 at 05:55
  • It seems like you are trying to modify values in the slice (shallow copy) of a dataframe, and not the actual dataframe itself, pandas is sure to throw a warning in such a case. – tidakdiinginkan May 11 '20 at 06:31
  • @Robin no change, no error, no warning – rocksNwaves May 11 '20 at 15:46
  • @tidakdiinginkan There is no warning or error. Plus, I am using .loc, which is supposed to address such issues, according to the docs. – rocksNwaves May 11 '20 at 15:57
  • 1
    @rocksNwaves check this link - [How to make a slice of DataFrame and “fillna” in specific slice using Python Pandas?](https://stackoverflow.com/questions/47457886/how-to-make-a-slice-of-dataframe-and-fillna-in-specific-slice-using-python-pan) The second attempt is the right way to assign values, but the correct implementation of the first method should be in the link. – tidakdiinginkan May 11 '20 at 20:39

0 Answers0