How to replace a range of values with NaN in Pandas data-frame?

Question

I have a huge data-frame. How should I replace a range of values (-200, -100) with NaN?

Welcome to SO! What have you tried to do so far? (The reason you're getting downvotes might be because you haven't discussed what you've done to solve the problem). — ASGM, Oct 20 '16 at 16:46
@ASGM I get that. I tried my best using multiple for loop but it was with many lines of code. I will keep that in mind. — Mat_python, Oct 20 '16 at 17:07
If the code you've used to solve a problem is really long, it can still be useful to provide at least some of it along with a description of the approach you've taken, and the ways in which it has failed. That gives potential answerers some context, as well as an indication that you've tried to solve it yourself first. — ASGM, Oct 20 '16 at 19:07
I don't think this question is a duplicate of the question listed here. It's the reverse. The linked question asks how to drop NaNs, this asks how to add them. — Danielle Madeley, Jun 29 '18 at 02:25

jpp · Answer 1 · 2018-11-14T17:38:41.567

dataframe

You can use pd.DataFrame.mask:

df.mask((df >= -200) & (df <= -100), inplace=True)

This method replaces elements identified by True values in a Boolean array with a specified value, defaulting to NaN if a value is not specified.

Equivalently, use pd.DataFrame.where with the reverse condition:

df.where((df < -200) | (df > -100), inplace=True)

series

As with many methods, Pandas helpfully includes versions which work with series rather than an entire dataframe. So, for a column df['A'], you can use pd.Series.mask with pd.Series.between:

df['A'].mask(df['A'].between(-200, -100), inplace=True)

For chaining, note inplace=False by default, so you can also use:

df['A'] = df['A'].mask(df['A'].between(-200, -100))

@jezrael, Yeh MaxU's answer is good, but always good to have variants :) — jpp, Aug 14 '18 at 12:19

MaxU - stand with Ukraine · Accepted Answer · 2020-11-30T12:27:48.357

11

You can do it this way:

In [145]: df = pd.DataFrame(np.random.randint(-250, 50, (10, 3)), columns=list('abc'))

In [146]: df
Out[146]:
     a    b    c
0 -188  -63 -228
1  -59  -70  -66
2 -110   39 -146
3  -67 -228 -232
4  -22 -180 -140
5 -191 -136 -188
6  -59  -30 -128
7 -201 -244 -195
8 -248  -30  -25
9   11    1   20

In [148]: df[(df>=-200) & (df<=-100)] = np.nan

In [149]: df
Out[149]:
       a      b      c
0    NaN  -63.0 -228.0
1  -59.0  -70.0  -66.0
2    NaN   39.0    NaN
3  -67.0 -228.0 -232.0
4  -22.0    NaN    NaN
5    NaN    NaN    NaN
6  -59.0  -30.0    NaN
7 -201.0 -244.0    NaN
8 -248.0  -30.0  -25.0
9   11.0    1.0   20.0

edited Nov 30 '20 at 12:27

answered Oct 20 '16 at 16:52

MaxU - stand with Ukraine

205,989
36
386
419

1

Use the following to avoid the `SettingWithCopyWarning` message: `df.loc[:, (df>=-200) & (df<=-100)] = np.nan` – Vishal Jun 04 '18 at 18:09
1

The code throws an error: `None of [Index([('a',), ('b',), ('c',)], dtype='object')] are in the [columns]` - do you have an idea about the reason (maybe an update) and a solution? :) Thanks a lot in advance! – Ivo Nov 30 '20 at 12:24
1

@Ivo, thank you for pointing it out - the old solution used to work properly under old Pandas version. It's fixed now - please test... – MaxU - stand with Ukraine Nov 30 '20 at 12:28

How to replace a range of values with NaN in Pandas data-frame?

2 Answers2

dataframe

series

Linked