58

I am looking to replace a number with NaN in numpy and am looking for a function like numpy.nan_to_num, except in reverse.

The number is likely to change as different arrays are processed because each can have a uniquely define NoDataValue. I have seen people using dictionaries, but the arrays are large and filled with both positive and negative floats. I suspect that it is not efficient to try to load all of these into anything to create keys.

I tried using the following but numpy requires that I use any() or all(). I realize that I need to iterate element wise, but hope that a built-in function can achieve this.

def replaceNoData(scanBlock, NDV):
    for n, i in enumerate(array):
        if i == NDV:
            scanBlock[n] = numpy.nan

NDV is GDAL's no data value and array is a numpy array.

Is a masked array the way to go perhaps?

cottontail
  • 10,268
  • 18
  • 50
  • 51
Jzl5325
  • 3,898
  • 8
  • 42
  • 62
  • 2
    I'm not sure I understand what is wrong with the solution you provide. Does it not work properly? – Chris Gregg Jul 15 '11 at 01:13
  • @Chris Gregg This solution needs some indenting, does not need to return array (since it is in-place), should probably avoid using `array` as a variable to avoid confusion with np.array, but most importantly, will be terribly slow compared to typical numpy indexing and broadcasting. – Paul Jul 15 '11 at 01:20
  • @Paul My concern was the speed, so many thanks for the answer below. I used the variables simply to make the code clearer, I to would avoid using array as well. – Jzl5325 Jul 15 '11 at 04:01

2 Answers2

78
A[A==NDV]=numpy.nan

A==NDV will produce a boolean array that can be used as an index for A

Paul
  • 42,322
  • 15
  • 106
  • 123
  • 4
    It's a special sequence of bits outside the valid range for the datatype in use. It's used to signify missing data or the result of a math error that did not produce a valid value. Hope this helps: https://en.wikipedia.org/wiki/NaN – Paul Oct 21 '16 at 23:19
  • 3
    Note that `A.min()` and `A.max()` are not good ways to check if it worked, since they will both return `nan`… If you wish to get min/max, use `np.nanmin(A)` and `np.nanmax(A)`. – Skippy le Grand Gourou Jul 15 '20 at 15:07
  • This doesn't work when A is readonly. – Tobia Mar 02 '22 at 17:44
2

You can also use np.where to replace a number with NaN.

arr = np.where(arr==NDV, np.nan, arr)

For example, the following result can be obtained via

arr = np.array([[1, 1, 2], [2, 0, 1]])
arr = np.where(arr==1, np.nan, arr)

res

This creates a new copy (unlike A[A==NDV]=np.nan) but in some cases that could be useful. For example, if the array was initially an int dtype, it will have to converted into a float array anyway (because replacing values with NaN won't work otherwise) and np.where can handle that.

cottontail
  • 10,268
  • 18
  • 50
  • 51