6

I have two numpy arrays that contains NaNs:

A = np.array([np.nan,   2,   np.nan,   3,   4])
B = np.array([   1  ,   2,     3   ,   4,  np.nan])

are there any smart way using numpy to remove the NaNs in both arrays, and also remove whats on the corresponding index in the other list? Making it look like this:

A = array([  2,   3, ])
B = array([  2,   4, ])
NicolaiF
  • 1,283
  • 1
  • 20
  • 44
  • are you also using `pandas`? (there's [`dropna`](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna) that does what you want with `DataFrame` objects) – Kos Mar 18 '15 at 11:18

2 Answers2

9

What you could do is add the 2 arrays together this will overwrite with NaN values where they are none, then use this to generate a boolean mask index and then use the index to index into your original numpy arrays:

In [193]:

A = np.array([np.nan,   2,   np.nan,   3,   4])
B = np.array([   1  ,   2,     3   ,   4,  np.nan])
idx = np.where(~np.isnan(A+B))
idx
print(A[idx])
print(B[idx])
[ 2.  3.]
[ 2.  4.]

output from A+B:

In [194]:

A+B
Out[194]:
array([ nan,   4.,  nan,   7.,  nan])

EDIT

As @Oliver W. has correctly pointed out, the np.where is unnecessary as np.isnan will produce a boolean index that you can use to index into the arrays:

In [199]:

A = np.array([np.nan,   2,   np.nan,   3,   4])
B = np.array([   1  ,   2,     3   ,   4,  np.nan])
idx = (~np.isnan(A+B))
print(A[idx])
print(B[idx])
[ 2.  3.]
[ 2.  4.]
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    Using `np.where` is entirely superfluous though. – Oliver W. Mar 18 '15 at 12:00
  • @OliverW. Hmm. yes you are correct, I was originally thinking that I needed the integer indices but yes it's unnecessary here, I'll update my answer – EdChum Mar 18 '15 at 12:02
8

A[~(np.isnan(A) | np.isnan(B))]

B[~(np.isnan(A) | np.isnan(B))]

FuzzyDuck
  • 1,492
  • 12
  • 14