0

I'm trying to get the index values out of a numpy array, I've tried using intersects instead to no avail. I'm simply trying to find like values in 2 arrays. One is 2D and I'm selecting a column, and the other is 1D, just a list of values to search for, so effectively just 2 1D arrays.

We'll call this array a:

 array([[    1, 97553,     1],
       [    1, 97587,     1],
       [    1, 97612,     1],
       [    1, 97697,     1],
       [    1, 97826,     3],
       [    1, 97832,     1],
       [    1, 97839,     1],
       [    1, 97887,     1],
       [    1, 97944,     1],
       [    1, 97955,     2]])

And we're searching say, values = numpy.array([97612, 97633, 97697, 97999, 97943, 97944])

So I try:

numpy.where(a[:, 1] == values)

And I'd expect a bunch of indices of the values, but instead I get back an array that's empty, it spits out [(array([], dtype=int64),)].

If I try this though:

numpy.where(a[:, 1] == 97697)

It gives me back (array([2]),), which is what I would expect.

What weirdness of arrays am I missing here? Or is there maybe even an easier way to do this? Finding array indices and matching arrays seems to not work as I expect at all. When I want to find the unions or intersects of arrays, by indice or unique value it just doesn't seem to function. Any help would be super. Thanks.

Edit: As per Warrens request:

import numpy

a = numpy.array([[    1, 97553,     1],
       [    1, 97587,     1],
       [    1, 97612,     1],
       [    1, 97697,     1],
       [    1, 97826,     3],
       [    1, 97832,     1],
       [    1, 97839,     1],
       [    1, 97887,     1],
       [    1, 97944,     1],
       [    1, 97955,     2]])

values = numpy.array([97612, 97633, 97697, 97999, 97943, 97944])

I've found that numpy.in1d will give me a correct truth table of booleans for the operation, with a 1d array of the same length that should map to the original data. My only issue here is now how to act with that, for instance deleting or modifying the original array at those indices. I could do it laboriously with a loop, but as far as I know there are better ways in numpy. Truth tables as masks are supposed to be quite powerful with numpy from what I have been able to find.

Will
  • 677
  • 3
  • 11
  • 21
  • 3
    *"If I try `numpy.intersect(a[:, 1], values)`, I should get back 97612, 97697, 97944. But I get something back that makes no sense."* I assume you mean `numpy.intersect1d`; there is no function `numpy.intersect`. Given the data that you show in the question, `np.intersect1d(a[:, 1], values)` returns `array([97612, 97697, 97944])`. Show *exactly* what you did, and show the unexpected result that you got. – Warren Weckesser Jun 25 '18 at 19:33
  • Possible duplicate of [Finding indices of matches of one array in another array](https://stackoverflow.com/questions/33678543/finding-indices-of-matches-of-one-array-in-another-array) – bobrobbob Jun 25 '18 at 21:06
  • @bobrobbob may be worth noting that the `np.searchsorted` based solution there gives wrong and confusing results if a value from `B` is not included in `A` (it gives you the index for sorted insertion) so it would need further processing to be used here. – filippo Jun 26 '18 at 04:41
  • @bobrobbob OP there does indeed require `B` is a sub-array of `A`, which is not the case here – filippo Jun 26 '18 at 04:47
  • @WarrenWeckesser the exact result was that it returned the entire array, and other than my typographic error of intersect1d versus intersect I have written precisely what I have done. It is impossible for the intersect to be all of the values, and yet it does. – Will Jun 26 '18 at 14:37
  • It would be great if you can create a [minimal, complete and verifiable example](https://stackoverflow.com/help/mcve) that we can run to reproduce that behavior. If what you say it true, that's a bug in the function. – Warren Weckesser Jun 26 '18 at 14:44
  • @WarrenWeckesser yes that's a good suggestion, thanks. I've added some simple and updated code, I'm getting somewhat better behaviour from 'in1d' in this one, though I still was not quite getting what I hoped from 'intersect1d'. Hopefully this truth table is a way forward. – Will Jun 26 '18 at 15:03
  • Thanks. Even better would be a *complete* example that shows the call to `numpy.intersect1d(a[:, 1], values)` and the result of that call. When I run `numpy.intersect1d(a[:, 1], values)` with your arrays `a` and `values` (using numpy 1.13.1), the result is `array([97612, 97697, 97944])`, as expected. – Warren Weckesser Jun 26 '18 at 17:04
  • @WarrenWeckesser to be honest I no longer have that bit of code as I've torn it up repeatedly to try to solve this. I did just run it against the separate example bit I coded and it gives me what I would expect, i.e. `[97612 97697 97944]`, so I assume I was giving the intersect function bad data somehow, have to skip that then. That leads me to the question of `numpy.where`, it seems confusing. But can I not use that truth mask I pull out from `numpy.in1d(a[:, 1], values)` with `numpy.where` to get the actual indices that match, i.e. `indices = numpy.where(numpy.in1d(a[:, 1], values))`? – Will Jun 26 '18 at 17:34
  • 1
    *"...I've torn it up repeatedly to try to solve this..."* Heh, I know the feeling. :) I agree that you probably were giving "bad" data to `intersect1d`. Since it is clear that the behavior of `intersect1d` is not part of the problem, you could remove the comments about it from the question. The way it is now just confuses the issue. – Warren Weckesser Jun 26 '18 at 18:27
  • @Will added a couple of examples of manipulations you can do either with the binary mask from `np.in1d` or the indices from `np.where`, see my edit – filippo Jun 26 '18 at 20:34
  • @WarrenWeckesser yup, that's a good idea, I've removed the bits about the intersect, hopefully it's less confused now for anyone else who has a similar issue. – Will Jul 03 '18 at 16:48

2 Answers2

10

np.where with a single argument is equivalent to np.nonzero. It gives you the indices where a condition, the input array, is True.

In your example you are checking for element-wise equality between a[:,1] and values

a[:, 1] == values
False

So it's giving you the correct result: no index in the input is True.

You should use np.isin instead

np.isin(a[:,1], values)
array([False, False,  True,  True, False, False, False, False,  True, False], dtype=bool)

Now you can use np.where to get the indices

np.where(np.isin(a[:,1], values))
(array([2, 3, 8]),)

and use those to address the original array

a[np.where(np.isin(a[:,1], values))]    
array([[    1, 97612,     1],
       [    1, 97697,     1],
       [    1, 97944,     1]])

Your initial solution with a simple equality check could indeed have worked with proper broadcasting:

np.where(a[:,1] == values[..., np.newaxis])[1]
array([2, 3, 8])

EDIT: given you seem to have issues with using the above results to index and manipulate your array here's a couple of simple examples

Now you should have two ways of accessing your matching elements in the original array, either the binary mask or the indices from np.where.

mask = np.isin(a[:,1], values)  # np.in1d if np.isin is not available
idx = np.where(mask)

Let's say you want to set all matching rows to zero

a[mask] = 0   # or a[idx] = 0
array([[    1, 97553,     1],
       [    1, 97587,     1],
       [    0,     0,     0],
       [    0,     0,     0],
       [    1, 97826,     3],
       [    1, 97832,     1],
       [    1, 97839,     1],
       [    1, 97887,     1],
       [    0,     0,     0],
       [    1, 97955,     2]])

Or you want to multiply the third column of matching rows by 100

a[mask, 2] *= 100
array([[    1, 97553,     1],
       [    1, 97587,     1],
       [    1, 97612,   100],
       [    1, 97697,   100],
       [    1, 97826,     3],
       [    1, 97832,     1],
       [    1, 97839,     1],
       [    1, 97887,     1],
       [    1, 97944,   100],
       [    1, 97955,     2]])

Or you want to delete matching rows (here using indices is more convenient than masks)

np.delete(a, idx, axis=0)
array([[    1, 97553,     1],
       [    1, 97587,     1],
       [    1, 97826,     3],
       [    1, 97832,     1],
       [    1, 97839,     1],
       [    1, 97887,     1],
       [    1, 97955,     2]])
filippo
  • 5,197
  • 2
  • 21
  • 44
  • 'numpy.isin' was my first thought, however it doesn't appear to exist in python 2.7, when I try to use it I get the error message that there is no such attribute in the module numpy. There must be some way to achieve what I'm attempting. – Will Jun 26 '18 at 14:39
  • @Will which NumPy version? `np.isin` was introduced in `1.13.0` but you should be able to obtain the same result with `np.in1d`. Actually I was using `np.in1d` in a previous revision of my answer, I changed it because current docs recommend `np.isin` in new code. – filippo Jun 26 '18 at 15:28
  • apparently version 1.11.0, 'in1d' works for me currently. Not sure how to upgrade just numpy without breaking the current install of python 2.7. – Will Jun 26 '18 at 17:04
1

Just a thought:

Try to flatten the 2D array and compare using numpy.intersect1d.

https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.flatten.html

https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.intersect1d.html

Nishad Hameed
  • 76
  • 1
  • 1
  • That's an interesting thought, as sometimes when I use things like 'numpy.where' I get back a tuple of arrays which I don't expect and it causes some issues. Can't flatten that, but I try to situationally squash those. If I flatten the entire 2D array I get a very long (three times longer of course) array that doesn't have indices that match, but I could possibly div the values to get the original indices back. Either way I think I can use the truth table I've gotten out with the 'numpy.where' function against the original array. Going to try that and see what I get. – Will Jun 26 '18 at 17:06