6

I have a list of strings

x = ['A', 'B', nan, 'D']

and want to remove the nan.

I tried:

x = x[~numpy.isnan(x)]

But that only works if it contains numbers. How do we solve this for strings in Python 3+?

Mazdak
  • 105,000
  • 18
  • 159
  • 188
WJA
  • 6,676
  • 16
  • 85
  • 152
  • @Kasramvd Can you explain what you mean by "numpy nan"? – Josh Lee Mar 23 '17 at 17:56
  • @JoshLee The `non` object from numpy module which the OP is using. I change it to numpy so that the future askers can find the question easily. – Mazdak Mar 23 '17 at 18:14

4 Answers4

6

If you have a numpy array you can simply check the item is not the string nan, but if you have a list you can check the identity with is and np.nan since it's a singleton object.

In [25]: x = np.array(['A', 'B', np.nan, 'D'])

In [26]: x
Out[26]: 
array(['A', 'B', 'nan', 'D'], 
      dtype='<U3')

In [27]: x[x != 'nan']
Out[27]: 
array(['A', 'B', 'D'], 
      dtype='<U3')


In [28]: x = ['A', 'B', np.nan, 'D']

In [30]: [i for i in x if i is not np.nan]
Out[30]: ['A', 'B', 'D']

Or as a functional approach in case you have a python list:

In [34]: from operator import is_not

In [35]: from functools import partial

In [37]: f = partial(is_not, np.nan)

In [38]: x = ['A', 'B', np.nan, 'D']

In [39]: list(filter(f, x))
Out[39]: ['A', 'B', 'D']
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • aggregate things like: `[i for i in x if not i in ['nan', np.nan]]`, +1 otherwise – Colonel Beauvel Mar 23 '17 at 15:34
  • @ColonelBeauvel Yeah, that's a good idea if you don't know that kind of a data structure you're dealing with. – Mazdak Mar 23 '17 at 15:38
  • NaN is not a singleton. – Josh Lee Mar 23 '17 at 17:55
  • @JoshLee Why? I think as far as you can't create different instances from a particular object it would be refer as singleton. Is there anything special about `np.nan`? – Mazdak Mar 23 '17 at 18:10
  • `np.nan` is just some floating point constant. You wouldn't compare `is math.pi` either, for the same reason. – Josh Lee Mar 23 '17 at 18:16
  • @JoshLee Well, IMHO, it doesn't make any difference, almost everything in python is object even the code, and once something is object it can be singleton or a regular object (AFAIK) like integers between -5 to 256 or other single tones in python that get cached in memory instead of having multiple instances with different ids. – Mazdak Mar 23 '17 at 18:41
3

You can use math.isnan and a good-old list comprehension.

Something like this would do the trick:

import math
x = [y for y in x if not math.isnan(y)]
Horia Coman
  • 8,681
  • 2
  • 23
  • 25
1

You may want to avoid np.nan with strings, use None instead; but if you do have nan you could do this:

import numpy as np

[i for i in x if i is not np.nan]
# ['A', 'B', 'D']
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

You could also try this:

[s for s in x if str(s) != 'nan']

Or, convert everything to str at the beginning:

[s for s in map(str, x) if s != 'nan']

Both approaches yield ['A', 'B', 'D'].

blacksite
  • 12,086
  • 10
  • 64
  • 109