0

I tried to find in stackoverflow a thread answering this question, but I could not find. Thus, if it is duplicate, please provide the link.
The use case is very common:
I have two arrays: X which contains two dimensional datapoints and y which contains labels either 0 or 1.
X has shape (307, 2)
y has shape (307, 1)
I want to have all rows in X where the corresponding row in y has value of 1.
I tried the following code:
X[y==1]
But it raises the following error:

IndexError: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1

How can I do that?

Code Pope
  • 5,075
  • 8
  • 26
  • 68
  • You could try `X[y, :]` – Mad Physicist Aug 21 '19 at 12:17
  • @MadPhysicist This gives a totally different array -> shape = (307, 1, 2). This is not what I am looking for. Just the rows where the corresponding rows in `y` have a value of 1 -> shape = (9, 2) – Code Pope Aug 21 '19 at 12:27
  • @MadPhysicist And `y` is not an array of boolean values as described in the question. Thus to mask, you have to write a condition which then results in the `IndexError` I also mentioned in the question and have found the reason which is stated in the answer to this question – Code Pope Aug 21 '19 at 12:44
  • `X[y.ravel().astype(np.bool), :]` – Mad Physicist Aug 21 '19 at 12:57

1 Answers1

0

I have found the following way:

X[np.where(np.any(y==1, axis=1))]

I also found out that the reason for the above error is that y has two dimensions. The following code will work, too, and uses masking which has a better performance:

y = y.reshape(-1)
X[y==1,:]
Code Pope
  • 5,075
  • 8
  • 26
  • 68
  • `np.any(y==1, axis=1)` is equivalent to `y.ravel()`. The call to `where` makes this fancy indexing, which is much less efficient than just using the mask. – Mad Physicist Aug 21 '19 at 12:15