Starting from pandas 0.20 ix is deprecated. The right way is to use df.loc
here is a working example
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A":[0,1,0], "B":[2,0,5]}, columns=list('AB'))
>>> df.loc[df.A == 0, 'B'] = np.nan
>>> df
A B
0 0 NaN
1 1 0
2 0 NaN
>>>
Explanation:
As explained in the doc here, .loc
is primarily label based, but may also be used with a boolean array.
So, what we are doing above is applying df.loc[row_index, column_index]
by:
- Exploiting the fact that
loc
can take a boolean array as a mask that tells pandas which subset of rows we want to change in row_index
- Exploiting the fact
loc
is also label based to select the column using the label 'B'
in the column_index
We can use logical, condition or any operation that returns a series of booleans to construct the array of booleans. In the above example, we want any rows
that contain a 0
, for that we can use df.A == 0
, as you can see in the example below, this returns a series of booleans.
>>> df = pd.DataFrame({"A":[0,1,0], "B":[2,0,5]}, columns=list('AB'))
>>> df
A B
0 0 2
1 1 0
2 0 5
>>> df.A == 0
0 True
1 False
2 True
Name: A, dtype: bool
>>>
Then, we use the above array of booleans to select and modify the necessary rows:
>>> df.loc[df.A == 0, 'B'] = np.nan
>>> df
A B
0 0 NaN
1 1 0
2 0 NaN
For more information check the advanced indexing documentation here.