0

I'm attempting to remove all NaN value entry from a Python 3.10 NumPy array of X-Y data points, prior to creating a polynomial fit via the polyfit NumPy function off of the data. The actual NaN values are located on the Y-axis, but I would like to remove both the X and Y values for each Y instance that's a NaN.


The following attempt:

import numpy as np

def main():
    dataX = [1, 2, 3, 4, 5]
    dataY = [1, np.nan, 5, np.nan, 1]

    finiteIdx = np.isfinite(dataX) & np.isfinite(dataY)
    poly = np.polyfit(dataX[finiteIdx], dataY[finiteIdx], 2)

if (__name__ == "__main__"):
    main()

Results in:

   poly = np.polyfit(dataX[finiteIdx], dataY[finiteIdx], 2)
TypeError: only integer scalar arrays can be converted to a scalar index

The following attempt:

import numpy as np

def main():
    dataX = [1, 2, 3, 4, 5]
    dataY = [1, np.nan, 5, np.nan, 1]

    poly = np.polyfit(dataX[~np.isnan(dataY)], dataY[~np.isnan(dataY)], 2)

if (__name__ == "__main__"):
    main()

Results in:

   poly = np.polyfit(dataX[~np.isnan(dataY)], dataY[~np.isnan(dataY)], 2)
TypeError: only integer scalar arrays can be converted to a scalar index

The following attempt:

import numpy as np

def main():
    dataX = [1, 2, 3, 4, 5]
    dataY = [1, np.nan, 5, np.nan, 1]

    poly = np.polyfit(dataX[dataY != np.nan], dataY[dataY != np.nan], 2)

if (__name__ == "__main__"):
    main()

Results in:

   raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x

What is the proper way of removing all NaN values from a NumPy array?

Thanks for reading my post, any guidance is appreciated.

Runsva
  • 365
  • 1
  • 7

2 Answers2

3

Regarding your first attempt, you just need to convert dataX and dataY to numpy ndarray, then the code would work.

import numpy as np

dataX = np.array([1, 2, 3, 4, 5])
dataY = np.array([1, np.nan, 5, np.nan, 1])

finiteIdx = np.isfinite(dataX) & np.isfinite(dataY)
poly = np.polyfit(dataX[finiteIdx], dataY[finiteIdx], 2)

The error message is misleading, and you could get more information in this question.

liginity
  • 311
  • 3
  • 7
1

You could combine dataX, dataY into a 2D array and drop the columns or rows which have NaN (and then use the array elements as later required). But if you do want to keep them separate then this could help:

import numpy as np

dataX = [1, 2, 3, 4, 5]
dataY = [1, np.nan, 5, np.nan, 1]

def clean2 (arr1, arr2):
    res1 = np.array([x for i, x in enumerate(arr1) if not np.isnan(dataY[i]) ])
    res2 = np.array([x for i, x in enumerate(arr2) if not np.isnan(dataY[i]) ])
    return res1, res2

dataX, dataY =  clean2(dataX, dataY)

print(dataX)
print(dataY)

gives

[1 3 5]
[1 5 1]
user19077881
  • 3,643
  • 2
  • 3
  • 14
  • This is can be futher optimised and made clearer by making `clean()` call both the arrays at the same time. Like defining `clean(arrX, arrY)` and clearing both arrays together. Right now, it is accessing a global variable which it will change itself. This breaks the function if `clean(dataY)` is called before `clean(dataX)`. – vmpyr Jun 24 '23 at 09:21
  • @vmpyr Good point. Thanks. I changed the answer to deal with your point. As now written the order of the arguments does not matter. – user19077881 Jun 24 '23 at 09:41