-1

I want to be able to run numpy commands on a list that maintains NaN values. Basically, I want to do linear regression on two list variables. One variable contains NaN's and so I can't do linear regression on it. But, if I delete the NaN value then the size of my list does not match the size of the non-NaN containing list. For example,

x = [1,2,3,4,5,NaN] If I delete the NaN value then the size of x becomes 5 y = [1,2,3,4,5,6] The size of y is 6

(x,y) = (1,1), (2,2), (3,3), (4,4), (5,5), (NaN, 6)

I want my linear regression to skip the data point (NaN, 6)

How can I do this?

Zakariah Siyaji
  • 989
  • 8
  • 27
  • a = np.array([(1,1), (2,2), (3,3), (4,4), (5,5), (np.nan, 6)]) then a[~np.any(np.isnan(a), axis=1)] – NaN Jul 12 '19 at 19:24

2 Answers2

1

You need to filter both arrays by the condition, not only one of them.

import numpy as np
x = np.array([1,2,np.NaN,4,5,np.NaN])
y = np.array([1,2,3,4,5,6])

condition = ~np.isnan(x)
xp = x[condition]
yp = y[condition]

print(xp)
print(yp)

So both arrays print as [1 2 4 5].

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • This won't work if there is a string in the array. For example, what if x is `np.array([1,2,np.NaN,4,5,'Hello World'])`. This snippet of code won't be able to remove the string. How would I be able to solve this for strings as well? – Zakariah Siyaji Jul 13 '19 at 05:01
  • See [this](https://stackoverflow.com/questions/37996471/element-wise-test-of-numpy-array-is-numeric) – ImportanceOfBeingErnest Jul 13 '19 at 11:37
-1

Here is a proper answer that handles NaN values along with strings.

def skipNaN(x, y):
    x = np.array(x)
    y = np.array(y)

    condition1 = np.array([isNumber(i) and isNumber(j) for i,j in zip(x,y)])

    x = x[condition1].astype('float64')
    y = y[condition1].astype('float64')

    condition2 = ~np.isnan(x)
    x = x[condition2].astype('float64')
    y = y[condition2].astype('float64')

    return[x,y]
Zakariah Siyaji
  • 989
  • 8
  • 27