Dropping infinite values from dataframes in pandas?

Question

How do I drop nan, inf, and -inf values from a DataFrame without resetting mode.use_inf_as_null?

Can I tell dropna to include inf in its definition of missing values so that the following works?

df.dropna(subset=["col1", "col2"], how="all")

score 678 · Accepted Answer · edited Jun 20 '22 at 01:43

678

First replace() infs with NaN:

df.replace([np.inf, -np.inf], np.nan, inplace=True)

and then drop NaNs via dropna():

df.dropna(subset=["col1", "col2"], how="all", inplace=True)

For example:

>>> df = pd.DataFrame({"col1": [1, np.inf, -np.inf], "col2": [2, 3, np.nan]})
>>> df
   col1  col2
0   1.0   2.0
1   inf   3.0
2  -inf   NaN

>>> df.replace([np.inf, -np.inf], np.nan, inplace=True)
>>> df
   col1  col2
0   1.0   2.0
1   NaN   3.0
2   NaN   NaN

>>> df.dropna(subset=["col1", "col2"], how="all", inplace=True)
>>> df
   col1  col2
0   1.0   2.0
1   NaN   3.0

The same method also works for Series.

edited Jun 20 '22 at 01:43

Mateen Ulhaq

24,552
19
101
135

answered Jul 04 '13 at 21:50

Andy Hayden

359,921
101
625
535

2

How can one "exchange" the `inf` values to a predefined `int` such as `0`, in a certain column? – 3kstc Apr 11 '18 at 22:45
5

@3kstc use `.replace(..., 0)`. To just do on columns you update those columns i.e. `df[cols] = df[cols].replace(..., 0)` – Andy Hayden Apr 12 '18 at 00:04
12

Maybe it's worth to specify that `replace` does not work in-place, so a new `DataFrame` is returned – Marco Jul 18 '19 at 07:53

ayhan · Answer 2 · 2019-08-02T20:47:43.987

87

With option context, this is possible without permanently setting use_inf_as_na. For example:

with pd.option_context('mode.use_inf_as_na', True):
    df = df.dropna(subset=['col1', 'col2'], how='all')

Of course it can be set to treat inf as NaN permanently with

pd.set_option('use_inf_as_na', True)

For older versions, replace use_inf_as_na with use_inf_as_null.

edited Aug 02 '19 at 20:47

answered Aug 17 '17 at 23:10

ayhan

70,170
20
182
203

12

This is the most readable answer and is consequently the best, even though it violates in letter (but not in spirit) the original question. – ijoseph Nov 21 '17 at 21:29
4

Pandas as of (at least) 0.24: `use_inf_as_null` had been deprecated and will be removed in a future version. Use `use_inf_as_na` instead. Add to/update answer? – Håkon T. Jul 25 '19 at 07:14
1

This one is a better choice to treat `inf` as nulls at the global setting levels instead of operational level. This could potentially saves time imputing the values first. – TaoPR Oct 01 '19 at 14:35

score 30 · Answer 3 · answered Mar 18 '19 at 18:41

30

Use (fast and simple):

df = df[np.isfinite(df).all(1)]

This answer is based on DougR's answer in an other question. Here an example code:

import pandas as pd
import numpy as np
df=pd.DataFrame([1,2,3,np.nan,4,np.inf,5,-np.inf,6])
print('Input:\n',df,sep='')
df = df[np.isfinite(df).all(1)]
print('\nDropped:\n',df,sep='')

Result:

Input:
    0
0  1.0000
1  2.0000
2  3.0000
3     NaN
4  4.0000
5     inf
6  5.0000
7    -inf
8  6.0000

Dropped:
     0
0  1.0
1  2.0
2  3.0
4  4.0
6  5.0
8  6.0

answered Mar 18 '19 at 18:41

Markus Dutschke

9,341
4
63
58

2

I am getting this error - TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' – Sep 26 '20 at 08:19
not with my code, I guess!? Probably you try to process a column the unsupported types like strings – Markus Dutschke Sep 26 '20 at 09:48

Alexander · Answer 4 · 2016-03-04T00:20:30.070

Here is another method using .loc to replace inf with nan on a Series:

s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan

So, in response to the original question:

df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))

for i in range(3): 
    df.iat[i, i] = np.inf

df
          A         B         C
0       inf  1.000000  1.000000
1  1.000000       inf  1.000000
2  1.000000  1.000000       inf

df.sum()
A    inf
B    inf
C    inf
dtype: float64

df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A    2
B    2
C    2
dtype: float64

has2k1 · Answer 5 · 2019-02-12T18:00:58.570

9

The above solution will modify the infs that are not in the target columns. To remedy that,

lst = [np.inf, -np.inf]
to_replace = {v: lst for v in ['col1', 'col2']}
df.replace(to_replace, np.nan)

edited Feb 12 '19 at 18:00

answered Aug 10 '14 at 02:27

has2k1

2,095
18
16

score 8 · Answer 6 · answered Nov 03 '17 at 18:34

Yet another solution would be to use the isin method. Use it to determine whether each value is infinite or missing and then chain the all method to determine if all the values in the rows are infinite or missing.

Finally, use the negation of that result to select the rows that don't have all infinite or missing values via boolean indexing.

all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]

score 5 · Answer 7 · edited Feb 03 '23 at 19:26

You can use pd.DataFrame.mask with np.isinf. You should ensure first your dataframe series are all of type float. Then use dropna with your existing logic.

print(df)

       col1      col2
0 -0.441406       inf
1 -0.321105      -inf
2 -0.412857  2.223047
3 -0.356610  2.513048

df = df.mask(np.isinf)

print(df)

       col1      col2
0 -0.441406       NaN
1 -0.321105       NaN
2 -0.412857  2.223047
3 -0.356610  2.513048

score 4 · Answer 8 · answered Jul 20 '21 at 16:10

To remove both Nan, and inf using a single command use

df = df[ np.isfinite( df ).all( axis = 1) ]

If for some reason the above doesn't work for you, please try the following 2 steps:

df = df[ ~( df.isnull().any( axis = 1 ) ) ] #to remove nan
df = df[ ~( df.isin( [np.inf, -np.inf]).any(axis =1) )] #to remove inf

score 3 · Answer 9 · answered Feb 01 '22 at 10:08

3

Unlike other answers here, this one line code worked for me.

import numpy as np
df= df[df['required_column_name']!= np.inf]

answered Feb 01 '22 at 10:08

Hari Krishnan

31
1

Thomas Moreau · Answer 10 · 2021-09-16T21:06:40.537

2

Just stumbled upon this one and I found a one line without replace or numpy:

df = pd.DataFrame(
    [[1, np.inf],
     [1, -np.inf],
     [1, 2]],
    columns=['a', 'b']
)
df.query("b not in [inf, -inf]")
>>> a  b
 2  1  2.0

For some version of pandas, one might need to use back ` around the name of the column b.

edited Sep 16 '21 at 21:06

answered Sep 16 '21 at 16:43

Thomas Moreau

4,377
1
20
32

Dropping infinite values from dataframes in pandas?

10 Answers10

Linked

Related