10

I have a large dataframe with inf, -inf values in different columns. I want to replace all inf, -inf values with NaN

I can do so column by column. So this works:

df['column name'] = df['column name'].replace(np.inf, np.nan)

But my code to do so in one go across the dataframe does not.

df.replace([np.inf, -np.inf], np.nan)

The output does not replace the inf values

postcolonialist
  • 449
  • 7
  • 17

2 Answers2

11

TL;DR


Replacing inf and -inf

df = df.replace([np.inf, -np.inf], np.nan)

Note that inplace is possible but not recommended and will soon be deprecated.

Slower df.applymap options:

  • df = df.applymap(lambda x: np.nan if x in [np.inf, -np.inf] else x)
  • df = df.applymap(lambda x: np.nan if np.isinf(x) else x)
  • df = df.applymap(lambda x: x if np.isfinite(x) else np.nan)

Setting mode.use_inf_as_na

Note that we don't actually have to modify df at all. Setting mode.use_inf_as_na will simply change the way inf and -inf are interpreted:

True means treat None, nan, -inf, inf as null
False means None and nan are null, but inf, -inf are not null (default)

  • Either enable globally

    pd.set_option('mode.use_inf_as_na', True)
    
  • Or locally via context manager

    with pd.option_context('mode.use_inf_as_na', True):
        ...
    
tdy
  • 36,675
  • 19
  • 86
  • 83
  • 2
    Use case: when I has set mode.use_inf_as_na I got error "ValueError: Input X contains infinity or a value too large for dtype('float64')." from MinMaxScaler. After it I was back to df.replace(). – Volkov Maxim Oct 19 '22 at 11:25
  • `mode.use_inf_as_na` changes only representation of `np.inf` and `np.NINF`. But under the hood it still stores them as `±inf`. So, if you want to get rid of them, you need to use replace(). – Bohdan Pylypenko May 17 '23 at 09:39
6

pandas.Series.replace doesn't happen in-place.

So the problem with your code to replace the whole dataframe does not work because you need to assign it back or, add inplace=True as a parameter. That's also why your column by column works, because you are assigning it back to the column df['column name'] = ...

Therefore, change df.replace([np.inf, -np.inf], np.nan) to either:

df.replace([np.inf, -np.inf], np.nan,inplace=True)

Or assign back to a new dataframe:

df = df.replace([np.inf, -np.inf], np.nan)

enter image description here

sophocles
  • 13,593
  • 3
  • 14
  • 33