Conditional Replace Pandas

Question

I have a DataFrame, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:

df[df.my_channel > 20000].my_channel = 0

If I copy the channel into a new data frame it's simple:

df2 = df.my_channel 

df2[df2 > 20000] = 0

This does exactly what I want, but seems not to work with the channel as part of the original DataFrame.

Found what I think you were looking for [here](http://stackoverflow.com/q/37841525/2662901). — feetwet, Jan 05 '17 at 20:21

score 273 · Accepted Answer · edited May 27 '19 at 05:18

.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Or, in one line,

df.loc[df.my_channel > 20000, 'my_channel'] = 0

mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.

Update: In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

lowtech · Answer 2 · 2019-01-26T21:40:51.607

102

Try

df.loc[df.my_channel > 20000, 'my_channel'] = 0

Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.

edited Jan 26 '19 at 21:40

answered Feb 06 '14 at 16:24

lowtech

2,492
2
22
31

10

Thank you. I found my own solution too, which was: df.my_channel[df.my_channel >20000] = 0 – BMichell Feb 06 '14 at 16:40
2

@BMichell I think your solution might start giving you warnings in 0.13, didn't have a chance to try yet – lowtech Feb 06 '14 at 19:14
yield error: /opt/anaconda3/envs/python35/lib/python3.5/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """Entry point for launching an IPython kernel. – Rutger Hofste Oct 10 '17 at 15:25
@RutgerHofste thanks for mentioning that, yet another argument never use Python3 – lowtech Oct 10 '17 at 15:29

score 76 · Answer 3 · edited Dec 04 '19 at 12:40

76

np.where function works as follows:

df['X'] = np.where(df['Y']>=50, 'yes', 'no')

In your case you would want:

import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)

edited Dec 04 '19 at 12:40

fpersyn

1,045
1
12
19

answered Feb 14 '18 at 20:41

seeiespi

3,628
2
35
37

jpp · Answer 4 · 2018-12-11T11:19:23.747

The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:

When setting values in a pandas object, care must be taken to avoid what is called chained indexing.

You have a few alternatives:-

`loc` + Boolean indexing

loc may be used for setting values and supports Boolean masks:

df.loc[df['my_channel'] > 20000, 'my_channel'] = 0

`mask` + Boolean indexing

You can assign to your series:

df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)

Or you can update your series in place:

df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)

`np.where` + Boolean indexing

You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.

df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])

@Egidius I have used `np.select` if having multiple conditions, or otherwise the `&` and `|` operators in Pandas. — Asclepius, Dec 26 '21 at 23:33

score 6 · Answer 5 · answered Jul 05 '19 at 14:24

6

Try this:

df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)

or

df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)

answered Jul 05 '19 at 14:24

R. Shams

76
1
5

score 4 · Answer 6 · answered Nov 10 '18 at 04:18

4

I would use lambda function on a Series of a DataFrame like this:

f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)

I do not assert that this is an efficient way, but it works fine.

answered Nov 10 '18 at 04:18

Ozkan Serttas

947
13
14

5

This is inefficient and not recommended as it involves a Python-level loop in a row-wise operation. – jpp Nov 10 '18 at 04:49
Thank you, I guess we can use `loc` here, like `df.loc[: , 'my_column'] = df['my_column'].map(f)` . I do not know if it is fast like the ones you added below. – Ozkan Serttas Nov 10 '18 at 04:56
2

Nope, still slow as you are still operating row-wise rather than column-wise. – jpp Nov 10 '18 at 05:00

score 0 · Answer 7 · answered Aug 23 '22 at 13:44

0

I would like to do the same operation but by comparing the dataframe value with a list:

df.loc[df['value'] in [1,2,3], 'another_column'] = 'yes'

so far I get the error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

but if I try any([1,2,3]) I then get the error:

TypeError: argument of type 'bool' is not iterable

answered Aug 23 '22 at 13:44

sam47

11
2

If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/32552503) – Christian Geier Aug 29 '22 at 17:08

Conditional Replace Pandas

7 Answers7

`loc` + Boolean indexing

`mask` + Boolean indexing

`np.where` + Boolean indexing

Linked

Related

Conditional Replace Pandas

7 Answers7

loc + Boolean indexing

mask + Boolean indexing

np.where + Boolean indexing

Linked

Related

`loc` + Boolean indexing

`mask` + Boolean indexing

`np.where` + Boolean indexing