1

I have 2 numpy matrix with slightly different alignment

X

    id,  value
     1,   0.78
     2,   0.65
     3,   0.77
       ...
       ...
    98,   0.88
    99,   0.77
   100,   0.87

Y

    id,  value
     1,   0.79
     2,   0.65
     3,   0.78
       ...
       ...
    98,   0.89
   100,   0.80

Y is simply missing a particular ID. I would like to perform vector operations on X and Y (e.g. correlation, difference...etc). Meaning I need to drop the corresponding missing value in X. How would I do that?

user3240688
  • 1,188
  • 3
  • 13
  • 34

3 Answers3

2

All the values are the same, so the extra element in x will be the difference between the sums.

This solution is o(n), other solutions here are o(n^2)

Data generation:

import numpy as np

# x = np.arange(10)
x = np.random.rand(10)
y = np.r_[x[:6], x[7:]]  # exclude 6
print(x)
np.random.shuffle(y)
print(y)

Solution:

Notice np.isclose() used for floating point comparison.

sum_x = np.sum(x)
sum_y = np.sum(y)
diff = sum_x - sum_y
value_index = np.argwhere(np.isclose(x, diff))

print(value_index)

Delete relevant index

deleted = np.delete(x, value_index)
print(deleted)

out:

[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.25859596 0.97969841 0.77368822 0.80105397]
[0.97969841 0.77368822 0.28651572 0.36373441 0.5030346  0.895204
 0.03352821 0.80105397 0.20693263]
[[6]]
[0.36373441 0.5030346  0.895204   0.03352821 0.20693263 0.28651572
 0.97969841 0.77368822 0.80105397]
Gulzar
  • 23,452
  • 27
  • 113
  • 201
0

You can try this:

X = X[~numpy.isnan(X)]
Y = Y[~numpy.isnan(Y)]

And there you can do whatever operation you want

Majid Hajibaba
  • 3,105
  • 6
  • 23
  • 55
0

Use in1d:

>>> X
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [ 9.  ,  0.1 ],
       [10.  ,  0.1 ]])

>>> Y
array([[ 1.  ,  0.19],
       [ 2.  ,  0.96],
       [ 3.  ,  0.24],
       [ 4.  ,  0.44],
       [ 5.  ,  0.12],
       [ 6.  ,  0.91],
       [ 7.  ,  0.7 ],
       [ 8.  ,  0.54],
       [10.  ,  0.09]])
>>> X[np.in1d(X[:, 0], Y[:, 0])]
array([[ 1.  ,  0.53],
       [ 2.  ,  0.72],
       [ 3.  ,  0.44],
       [ 4.  ,  0.35],
       [ 5.  ,  0.32],
       [ 6.  ,  0.14],
       [ 7.  ,  0.52],
       [ 8.  ,  0.4 ],
       [10.  ,  0.1 ]])
Corralien
  • 109,409
  • 8
  • 28
  • 52