How do I remove NaN values from a NumPy array?

Question

[1, 2, NaN, 4, NaN, 8]   ⟶   [1, 2, 4, 8]

score 565 · Accepted Answer · edited Jul 30 '22 at 05:52

565

To remove NaN values from a NumPy array x:

x = x[~numpy.isnan(x)]

Explanation

The inner function numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. Since we want the opposite, we use the logical-not operator ~ to get an array with Trues everywhere that x is a valid number.

Lastly, we use this logical array to index into the original array x, in order to retrieve just the non-NaN values.

edited Jul 30 '22 at 05:52

Mateen Ulhaq

24,552
19
101
135

answered Jul 23 '12 at 21:42

jmetz

12,144
3
30
41

50

Or `x = x[numpy.isfinite(x)]` – Miki Tebeka Jul 23 '12 at 22:29
23

Or `x = x[~numpy.isnan(x)]`, which is equivalent to mutzmatron's original answer, but shorter. In case you want to keep your infinities around, know that `numpy.isfinite(numpy.inf) == False`, of course, but `~numpy.isnan(numpy.inf) == True`. – chbrown Nov 19 '13 at 19:02
@dax-felizv I agree with @chbrown, NaN and Infinite are not the same in `numpy`. @chbrown - thanks for pointing out the shorthand for `logical_not`, though beware that it is considerably slower - http://stackoverflow.com/questions/15998188/how-can-i-obtain-the-element-wise-logical-not-of-a-pandas-series, http://stackoverflow.com/questions/13600988/python-tilde-unary-operator-as-negation-numpy-bool-array – jmetz Nov 20 '13 at 19:45
Hmm, @mutzmatron -- I figured they did the same thing underneath the hood, and I'm getting very similar results with timeit (as did @unutbu at that first link): `python -m timeit -s "import numpy; bools = numpy.random.uniform(size=10000) >= 0.5" "numpy.logical_not(bools)"` vs. `python -m timeit -s "import numpy; bools = numpy.random.uniform(size=10000) >= 0.5" "~bools"` (`numpy.__version__ == '1.8.0'`) – chbrown Nov 20 '13 at 22:41
@chbrown - you're right, any performance gain with numpy seems to have only occurred on the second posters machine - I tested `numpy.invert` and `numpy.logical_not` and got the same result for both as for `~`, on numpy v1.7.1. Not sure if architecture affects comparative performance - am testing on my chromebook (armv7l). – jmetz Nov 23 '13 at 12:43
16

For people looking to solve this with an ndarray and maintain the dimensions, use [numpy where](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html): `np.where(np.isfinite(x), x, 0)` – BoltzmannBrain Sep 07 '17 at 02:51
1

TypeError: only integer scalar arrays can be converted to a scalar index – towry Jun 30 '18 at 14:29
1

@towry: this is happening because your input, `x` is not a numpy array. If you want to use logical indexing, it must be an array - e.g. `x = np.array(x)` – jmetz Jul 02 '18 at 11:32
Also, to completely remove the non-finite rows, use `.any(axis=1)`. The full code will be `x=x[~pd.isnull(x).any(axis=1)]` for Pandas or `x=x[~np.isnan(x).any(axis=1)]`for Numpy. Note that these are working on different type of variables. – Dark Mar 25 '20 at 12:46
@Dark - thanks for the useful example for 2d data, though it's beyond the scope of the OP's question which relates only to a 1d input. Perhaps it would be useful for others posted as a separate Q and A? – jmetz Mar 25 '20 at 16:51

score 75 · Answer 2 · answered Apr 16 '15 at 15:46

75

filter(lambda v: v==v, x)

works both for lists and numpy array since v!=v only for NaN

answered Apr 16 '15 at 15:46

udibr

1,069
7
5

12

A hack but an especially useful one in the case where you are filtering nans from an array of objects with mixed types, such as a strings and nans. – Austin Richardson Jun 29 '15 at 14:15
5

This might seem clever, but if obscures the logic and theoretically other objects (such as custom classes) can also have this property – Chris_Rands Jul 31 '18 at 15:02
Also useful because it only needs `x` to be specified once as opposed to solutions of the type `x[~numpy.isnan(x)]`. This is convenient when `x` is defined by a long expression and you don't want to clutter the code by creating a temporary variable to store the result of this long expression. – Christian O'Reilly Jun 15 '20 at 01:09
1

It might be slow compere to `x[~numpy.isnan(x)] ` – smm Aug 21 '20 at 21:23
Similarly, as a list comprehension, e.g. `[v for v in var if v == v]` – Darren Weber May 18 '22 at 17:15
This can avoid `TypeError: ufunc 'isnan' not supported for the input types` when the var contains mixtures of `nan` and strings, as noted by @AustinRichardson – Darren Weber May 18 '22 at 17:18
what is v and what is x? – M_Idk392845 Nov 21 '22 at 21:12

score 43 · Answer 3 · answered Apr 18 '17 at 14:37

43

For me the answer by @jmetz didn't work, however using pandas isnull() did.

x = x[~pd.isnull(x)]

answered Apr 18 '17 at 14:37

Daniel Kislyuk

956
10
11

2

or: `x = x[x.notnull()]` – kbridge4096 Jun 05 '22 at 17:18
I am not found of including pandas on the pipe but the accepted solution got me `TypeError: ufunc 'isnan' not supported for the input types`. It does not work with strings or object types. This solution did. – Llohann Jun 23 '23 at 07:32

score 36 · Answer 4 · answered Jul 23 '12 at 21:39

36

Try this:

import math
print [value for value in x if not math.isnan(value)]

For more, read on List Comprehensions.

answered Jul 23 '12 at 21:39

liori

40,917
13
78
105

5

If you're using numpy both my answer and that by @lazy1 are almost an order of magnitude faster than the list comprehension - lazy1's solution is slightly faster (though technically will also not return any infinity values). – jmetz Jul 24 '12 at 13:54
Don't forget the brackets :) `print ([value for value in x if not math.isnan(value)])` – hypers Nov 22 '17 at 16:09
If you're using numpy like the top answer then you can use this list comprehension answer with the `np` package: So returns your list without the nans: `[value for value in x if not np.isnan(value)]` – yeliabsalohcin Nov 23 '18 at 14:09

score 20 · Answer 5 · answered May 04 '20 at 09:43

@jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.

To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:

x = x[~numpy.isnan(x).any(axis=1)]

See more detail here.

score 8 · Answer 6 · edited Jan 09 '20 at 13:55

8

As shown by others

x[~numpy.isnan(x)]

works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.

x[~pandas.isna(x)] or x[~pandas.isnull(x)]

edited Jan 09 '20 at 13:55

Shashank Srivastava

195
1
12

answered Nov 25 '17 at 12:55

koliyat9811

845
1
10
11

score 7 · Answer 7 · answered Feb 16 '18 at 09:19

7

If you're using numpy

# first get the indices where the values are finite
ii = np.isfinite(x)

# second get the values
x = x[ii]

answered Feb 16 '18 at 09:19

aloha

4,554
6
32
40

score 7 · Answer 8 · answered Mar 16 '19 at 06:37

The accepted answer changes shape for 2d arrays. I present a solution here, using the Pandas dropna() functionality. It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.

import pandas as pd
import numpy as np

def dropna(arr, *args, **kwarg):
    assert isinstance(arr, np.ndarray)
    dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
    if arr.ndim==1:
        dropped=dropped.flatten()
    return dropped

x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )


print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')

print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')

print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')

Result:

==================== 1D Case: ====================
Input:
[1400. 1500. 1600.   nan   nan   nan 1700.]

dropna:
[1400. 1500. 1600. 1700.]


==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna (rows):
[[1400. 1500. 1600.]]

dropna (columns):
[[1500.]
 [   0.]
 [1800.]]


==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna:
[1400. 1500. 1600. 1700.]

score 7 · Answer 9 · answered Mar 15 '21 at 18:36

In case it helps, for simple 1d arrays:

x = np.array([np.nan, 1, 2, 3, 4])

x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])

but if you wish to expand to matrices and preserve the shape:

x = np.array([
    [np.nan, np.nan],
    [np.nan, 0],
    [1, 2],
    [3, 4]
])

x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
           [3., 4.]])

I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.

score 6 · Answer 10 · answered Jun 23 '16 at 20:35

6

Doing the above :

x = x[~numpy.isnan(x)]

or

x = x[numpy.logical_not(numpy.isnan(x))]

I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans. e.g.

y = x[~numpy.isnan(x)]

answered Jun 23 '16 at 20:35

melissaOu

61
1
2

This is strange; according to [the docs](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing), boolean array indexing (which this is), is under **advanced indexing** which apparently "always returns a copy of the data", so you should be over-writing `x` with the new value (i.e. without the NaNs...). Can you provide any more info as to why this could be happening? – jmetz Mar 24 '17 at 10:35

bitbang · Answer 11 · 2020-12-18T10:59:17.110

1

Simply fill with

 x = numpy.array([
 [0.99929941, 0.84724713, -0.1500044],
 [-0.79709026, numpy.NaN, -0.4406645],
 [-0.3599013, -0.63565744, -0.70251352]])

x[numpy.isnan(x)] = .555

print(x)

# [[ 0.99929941  0.84724713 -0.1500044 ]
#  [-0.79709026  0.555      -0.4406645 ]
#  [-0.3599013  -0.63565744 -0.70251352]]

edited Dec 18 '20 at 10:59

answered Dec 18 '20 at 10:08

bitbang

1,804
14
18

Darren Weber · Answer 12 · 2022-05-19T16:50:42.190

pandas introduces an option to convert all data types to missing values.

https://pandas.pydata.org/docs/user_guide/missing_data.html

The np.isnan() function is not compatible with all data types, e.g.

>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value:

>>> import numpy as np
>>> import pandas as pd

>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0    NaN
1      x
2      y
dtype: object
>>> values.loc[pd.isna(values)]
0    NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0    <NA>
dtype: object
>>> values
0    <NA>
1       x
2       y
dtype: object

#
# using map with lambda, or a list comprehension
#

>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']

score -2 · Answer 13 · answered Jun 21 '17 at 18:03

-2

A simplest way is:

numpy.nan_to_num(x)

Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

answered Jun 21 '17 at 18:03

Bruno Rodrigues de Oliveira

45
2

5

Welcome to SO! The solution you propose does not answer the problem: your solution substitutes `NaN`s with a large number, while the OP asked to entirely remove the elements. – Pier Paolo Jun 21 '17 at 18:49

How do I remove NaN values from a NumPy array?

13 Answers13

Explanation

Linked

Related