69

Is there a way to store NaN in a Numpy array of integers? I get:

a=np.array([1],dtype=long)
a[0]=np.nan

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot convert float NaN to integer
Hooked
  • 84,485
  • 43
  • 192
  • 261
Yariv
  • 12,945
  • 19
  • 54
  • 75

2 Answers2

60

No, you can't, at least with current version of NumPy. A nan is a special value for float arrays only.

There are talks about introducing a special bit that would allow non-float arrays to store what in practice would correspond to a nan, but so far (2012/10), it's only talks.

In the meantime, you may want to consider the numpy.ma package: instead of picking an invalid integer like -99999, you could use the special numpy.ma.masked value to represent an invalid value.

a = np.ma.array([1,2,3,4,5], dtype=int)
a[1] = np.ma.masked
masked_array(data = [1 -- 3 4 5],
             mask = [False  True False False False],
       fill_value = 999999)
Pierre GM
  • 19,809
  • 3
  • 56
  • 67
  • 14
    But be aware that there is a huge performance cost to use masked arrays as they are implemented in pure python! – gaborous Apr 08 '15 at 00:18
  • 1
    @gaborous Whoa, really? I thought they were the recommended way to do such things? – endolith Nov 19 '18 at 22:16
  • @endolith Yes I found the info a long time ago in one of numpy's github issues but I don't have the link anymore. However since it was a long time ago, this might have been optimized (although I doubt so, one would need to compile to cython or similar first). – gaborous Nov 20 '18 at 15:06
  • Just to be clear, `nan` and `null` are not the same thing. Also, while it is not a direct substitute for `numpy`, `cuDF` does support nulls. – cwharris Sep 10 '19 at 18:51
12

A nan is a floating point only thing, there is no representation of it in the integers, so no :)

Pick an invalid value, like -99999

Julian
  • 852
  • 4
  • 9
  • 17
    Picking a canonical value as invalid wouldn't be a good solution as that wouldn't replicate the same properties as nan, namely: comparisons between nan and any other value including itself should be false. – christang Nov 11 '15 at 13:46
  • 3
    Using a sentinel value isn't ideal, but it's sufficient under the condition that you understand your data well enough to know the sentinel will not interfere with your computations. For instance, if you know your values are (not just "should be") always `>= 0`, then using a negative sentinel is acceptable (unless you're doing an operation where the outcome could have a different sign than the input, such as `-1 * -1`). If you're writing a framework and end up using sentinels, you should probably allow that value to be chosen by the user on an individual operation basis. Again, _not ideal_. – cwharris Sep 10 '19 at 18:55
  • 1
    If your dataset is not going to change, then there are 2 easy ways that are closest to ideal: np.amin()-1 and np.amax()+1. Now your placeholder value is going to be unique, except in the case that np.amin() == np.iinfo(np.int32).min or np.amax()==np.iinfo(np.int32).max. In those cases, can use np.unique() and if the number of unique is equal to size of the data type, you must throw an error as no placeholder is possible. Otherwise search for the first value not in np.unique() efficiently by taking np.diff() and seeing the first place a difference is present, etc. – Gregory Morse Nov 26 '19 at 03:04
  • Sentinel values are actually used in a lot of real world databases especially in the healthcare industry such as Weight of Newborn Babies where -1 is used designate an unsuccessful birth. – NoName Jan 13 '20 at 16:45
  • @NoName Yes, and that's bad. If I had a dollar for every bug caused by "Sentinel" values being used where a `NaN` or `missing` object should have been... – Closed Limelike Curves Apr 26 '23 at 16:38