I have a huge data-frame. How should I replace a range of values (-200, -100) with NaN?
Asked
Active
Viewed 2.1k times
7
-
Welcome to SO! What have you tried to do so far? (The reason you're getting downvotes might be because you haven't discussed what you've done to solve the problem). – ASGM Oct 20 '16 at 16:46
-
@ASGM I get that. I tried my best using multiple for loop but it was with many lines of code. I will keep that in mind. – Mat_python Oct 20 '16 at 17:07
-
If the code you've used to solve a problem is really long, it can still be useful to provide at least some of it along with a description of the approach you've taken, and the ways in which it has failed. That gives potential answerers some context, as well as an indication that you've tried to solve it yourself first. – ASGM Oct 20 '16 at 19:07
-
3I don't think this question is a duplicate of the question listed here. It's the reverse. The linked question asks how to drop NaNs, this asks how to add them. – Danielle Madeley Jun 29 '18 at 02:25
2 Answers
17
dataframe
You can use pd.DataFrame.mask
:
df.mask((df >= -200) & (df <= -100), inplace=True)
This method replaces elements identified by True
values in a Boolean array with a specified value, defaulting to NaN
if a value is not specified.
Equivalently, use pd.DataFrame.where
with the reverse condition:
df.where((df < -200) | (df > -100), inplace=True)
series
As with many methods, Pandas helpfully includes versions which work with series rather than an entire dataframe. So, for a column df['A']
, you can use pd.Series.mask
with pd.Series.between
:
df['A'].mask(df['A'].between(-200, -100), inplace=True)
For chaining, note inplace=False
by default, so you can also use:
df['A'] = df['A'].mask(df['A'].between(-200, -100))

jpp
- 159,742
- 34
- 281
- 339
11
You can do it this way:
In [145]: df = pd.DataFrame(np.random.randint(-250, 50, (10, 3)), columns=list('abc'))
In [146]: df
Out[146]:
a b c
0 -188 -63 -228
1 -59 -70 -66
2 -110 39 -146
3 -67 -228 -232
4 -22 -180 -140
5 -191 -136 -188
6 -59 -30 -128
7 -201 -244 -195
8 -248 -30 -25
9 11 1 20
In [148]: df[(df>=-200) & (df<=-100)] = np.nan
In [149]: df
Out[149]:
a b c
0 NaN -63.0 -228.0
1 -59.0 -70.0 -66.0
2 NaN 39.0 NaN
3 -67.0 -228.0 -232.0
4 -22.0 NaN NaN
5 NaN NaN NaN
6 -59.0 -30.0 NaN
7 -201.0 -244.0 NaN
8 -248.0 -30.0 -25.0
9 11.0 1.0 20.0

MaxU - stand with Ukraine
- 205,989
- 36
- 386
- 419
-
1Use the following to avoid the `SettingWithCopyWarning` message: `df.loc[:, (df>=-200) & (df<=-100)] = np.nan` – Vishal Jun 04 '18 at 18:09
-
1The code throws an error: `None of [Index([('a',), ('b',), ('c',)], dtype='object')] are in the [columns]` - do you have an idea about the reason (maybe an update) and a solution? :) Thanks a lot in advance! – Ivo Nov 30 '20 at 12:24
-
1@Ivo, thank you for pointing it out - the old solution used to work properly under old Pandas version. It's fixed now - please test... – MaxU - stand with Ukraine Nov 30 '20 at 12:28