2

I was replacing values in columns and noticed that if use mask on all the dataframe, it will produce expected results, but if I used it against selected columns with .loc, it won't change any value.

Can you explain why and tell if it is expected result?

You can try with a dataframe dt, containing 0 in columns:

dt = pd.DataFrame(np.random.randint(0,3,size=(10, 3)), columns=list('ABC'))
dt.mask(lambda x: x == 0, np.nan, inplace=True)
# will replace all zeros to nan, OK.

But:

dt = pd.DataFrame(np.random.randint(0,3,size=(10, 3)), columns=list('ABC'))
columns = list('BC')
dt.loc[:, columns].mask(lambda x: x == 0, np.nan, inplace=True)
# won't cange anything. I excpet B, C columns to have values replaced
user305883
  • 1,635
  • 2
  • 24
  • 48

2 Answers2

2

i guess it's because the DataFrame.loc property is just giving access to a slice of your dataframe and you are masking a copy of the dataframe so it doesn't affect the data.

you can try this instead:

dt[columns] = dt[columns].mask(dt[columns] == 0)
SergFSM
  • 1,419
  • 1
  • 4
  • 7
  • Hi @SergFSM can you clarify : I was trying `dt[columns].mask(lambda x: x == 0, np.nan, inplace=True)` and would expect and inplace access to data, non masking a copy, but masking the very same dataset. But your solution works. Can you explain why ? – user305883 Sep 07 '22 at 20:37
  • 1
    @user305883, `df[]` performs in the same manner as `df.loc[]`, it slices columns when `loc` slices cells – SergFSM Sep 08 '22 at 06:54
  • Hi @SergFSM, thank you. I am solicited to mark an answer as correct, your answer is synthetic and perfectly fine for me, but eventually I ticked the other one for the explanations and links the author added - I anyway wanted to write a comment to say thank you and hope you don't feel bad about ! – user305883 Sep 08 '22 at 11:55
  • @user305883, no worries, it's OK – SergFSM Sep 08 '22 at 13:05
2

The loc functions returns a copy of the dataframe. On this copy you are applying the mask function that perform the operation in place on the data. You can't do this on a one-liner, otherwise the memory copy remains inaccessible. To get access to that memory area you have to split the code into 2 lines, to get a reference to that memory area:

tmp = dt.loc[:, columns]
tmp.mask(tmp[columns] == 0, np.nan, inplace=True)

and then you can go and update the dataframe:

dt[columns] = tmp

Not using the inplace update of the mask function, on the other hand, you can do everything with one line of code

dt[columns] = dt.loc[:, columns].mask(dt[columns] == 0, np.nan, inplace=False)

Extra: If you want to better understand the use of the inplace method in pandas, I recommend you read these posts:

Massifox
  • 4,369
  • 11
  • 31
  • 1
    thank you for the explaination, can you clarify this: dt[columns].mask(lambda x: x == 0, np.nan, inplace=True) // will give: `SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame` while `inplace=False` will not give error. why so ? I would expect not because `inplace=True` does not make a copy. It's confusing.. so you say one problem is the use of inplace, and the other use of loc ? can I avoid using loc ? (see answer below) – user305883 Sep 07 '22 at 20:31
  • 1
    with `dt[columns]` exactly the same reasoning applies as applied to the `dt.loc[:,columns]` case, so you need 2 passes using a temporary variable. I will add in the "Extra" section a link that will better explain to you the use indexing with loc and with the notation with brackets. – Massifox Sep 08 '22 at 11:43