I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?
9 Answers
If all your columns are numeric, you can use boolean indexing:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
In [3]: df
Out[3]:
a b
0 0 -3
1 -1 2
2 2 1
In [4]: df[df < 0] = 0
In [5]: df
Out[5]:
a b
0 0 0
1 0 2
2 2 1
For the more general case, this answer shows the private method _get_numeric_data
:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
'c': ['foo', 'goo', 'bar']})
In [3]: df
Out[3]:
a b c
0 0 -3 foo
1 -1 2 goo
2 2 1 bar
In [4]: num = df._get_numeric_data()
In [5]: num[num < 0] = 0
In [6]: df
Out[6]:
a b c
0 0 0 foo
1 0 2 goo
2 2 1 bar
With timedelta
type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})
In [3]: df
Out[3]:
a b
0 0 days -3 days
1 -1 days 2 days
2 2 days 1 days
In [4]: for k, v in df.iteritems():
...: v[v < 0] = 0
...:
In [5]: df
Out[5]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days
Update: comparison with a pd.Timedelta
works on the whole DataFrame:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})
In [3]: df[df < pd.Timedelta(0)] = 0
In [4]: df
Out[4]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

- 1
- 1

- 63,701
- 20
- 147
- 175
Another succinct way of doing this is pandas.DataFrame.clip.
For example:
import pandas as pd
In [20]: df = pd.DataFrame({'a': [-1, 100, -2]})
In [21]: df
Out[21]:
a
0 -1
1 100
2 -2
In [22]: df.clip(lower=0)
Out[22]:
a
0 0
1 100
2 0
-
1
-
19If you only want to apply `clip` on a specific column you can go like `df['col_name'] = df['col_name'].clip(lower=0)` – gies0r Oct 04 '19 at 09:01
-
6`clip_lower` has been [deprecated](https://pandas.pydata.org/pandas-docs/version/0.24.2/reference/api/pandas.DataFrame.clip_lower.html) so rather stick to `df.clip(lower=0)` – Sally Levesque Jun 21 '20 at 21:35
-
-
@DomingoR said this is inline, but this doesn't edit the values inplace, you need to assign to the same column to have the clip work. – Diedre Feb 16 '23 at 23:54
-
Another clean option that I have found useful is pandas.DataFrame.mask which will "replace values where the condition is true."
Create the DataFrame:
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
In [4]: df
Out[4]:
a b
0 0 -3
1 -1 2
2 2 1
Replace negative numbers with 0:
In [5]: df.mask(df < 0, 0)
Out[5]:
a b
0 0 0
1 0 2
2 2 1
Or, replace negative numbers with NaN, which I frequently need:
In [7]: df.mask(df < 0)
Out[7]:
a b
0 0.0 NaN
1 NaN 2.0
2 2.0 1.0

- 829
- 1
- 7
- 14
Perhaps you could use pandas.where(args)
like so:
data_frame = data_frame.where(data_frame < 0, 0)

- 4,972
- 8
- 30
- 47
With lambda function
df['column'] = df['column'].apply(lambda x : x if x > 0 else 0)

- 4,446
- 2
- 39
- 46
If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.
for col in df.columns:
df[col][df[col] < 0] = 0

- 47,830
- 31
- 106
- 135

- 97
- 4
-
You get a *A value is trying to be set on a copy of a slice from a DataFrame* warning when you do this – ajrlewis Jul 05 '19 at 10:17
-
A slight modification of the answers present.
Let's identify all the numeric columns and create a dataframe with all numeric values. Then replace the negative values with NaN in new dataframe
df_numeric = df.select_dtypes(include=[np.number])
df_numeric = df_numeric.where(lambda x: x > 0, np.nan)
Now, drop the columns where negative values are handled in the main data frame and then concatenate the new column values to the main data frame
numeric_cols = df_numeric.columns.values
df = df.drop(columns=numeric_cols)
df = pd.concat([df, df_numeric], axis = 1)

- 782
- 1
- 6
- 13
If you have a dataset of mixed data types, also consider moving the non-numerics to the index, updating the data, then removing the index:
df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
'c': ['foo', 'goo', 'bar']})
df = df.set_index('c')
df[df < 0] = 0
df = df.reset_index()
The approach using _get_numeric_data()
didn't work for me for some reason.

- 740
- 1
- 8
- 19
Try this:
df.loc[(df < 0).index, :] = 0
To avoid getting a unpredicatable behavior: Returning a view versus a copy

- 51
- 1
- 3