To complete the question with an example.
>>> import pandas as pd
>>> ser = pd.Series([1999, 2000, 2001, 2002, 2003])
>>> ser
0 1999
1 2000
2 2001
3 2002
4 2003
dtype: int64
Meaning of ser > 2000
>>> ser > 2000
0 False
1 False
2 True
3 True
4 True
dtype: bool
As you can see ser > 2000
returns a series itself, with True
or False
values, depending on whether the condition matched.
There are several ways to then use that condition.
The mask
function
mask can accept the condition and returns a new Series that "replaces" the values with the provided value (the original series won't change unless you set inplace
). (See also Mask User Guide section)
>>> ser.mask(ser > 2000, 2000)
0 1999
1 2000
2 2000
3 2000
4 2000
dtype: int64
That is somewhat equivalent to:
>>> [(2000 if x > 2000 else x) for x in ser]
[1999, 2000, 2000, 2000, 2000]
The where
function
where is the inverse of mask, therefore you'd want to invert the condition to achieve the same effect. Here the second argument is other
, providing the replacement value where the condition is False
. (See also Where User Guide section)
>>> ser.where(ser <= 2000, 2000)
0 1999
1 2000
2 2000
3 2000
4 2000
dtype: int64
That is somewhat equivalent to:
>>> [(x if x <= 2000 else 2000) for x in ser]
[1999, 2000, 2000, 2000, 2000]
assignment via boolean indexing
You can also change the series directly via boolean indexing as indicated in other answers (adding for completeness):
>>> ser
0 1999
1 2000
2 2001
3 2002
4 2003
dtype: int64
>>> ser[ser > 2000] = 2000
>>> ser
0 1999
1 2000
2 2000
3 2000
4 2000
dtype: int64
(That would then be equivalent to ser.mask(ser > 2000, 2000, inplace=True)
)
The apply
function
You could also use apply (also with an optional inplace
parameter):
>>> ser = pd.Series([1999, 2000, 2001, 2002, 2003])
>>> ser.apply(lambda x: 2000 if x > 2000 else x)
0 1999
1 2000
2 2000
3 2000
4 2000
dtype: int64
That allows you to use a regular Python function or expression. But it won't be as efficient for large series as the other examples, as it will call the Python expression for each value rather than doing everything within Pandas (vectorized).
Similar questions