4

I have an array of strings arr in which I want to search for elements and get the index of element. Numpy has a method where to search element and return index in a tuple form.

arr = numpy.array(["string1","string2","string3"])
print(numpy.where(arr == "string1")

It prints:

(array([0], dtype=int64),)

But I only want the index number 0.

I tried this:

i = numpy.where(arr == "string1")
print("idx = {}".format(i[0]))

which has output:

i = [0]

Is there any way to get the index number without using replace or slicing method?

joanis
  • 10,635
  • 14
  • 30
  • 40
lokp
  • 53
  • 12
  • I find this helpful [Extract the index from ](https://stackoverflow.com/a/23994923/15633731) – Yosef.Schwartz Oct 19 '22 at 14:07
  • 3
    What's the problem with `np.where(arr)[0][0]`? – dankal444 Oct 19 '22 at 14:54
  • `np.where(arr)[0][0]` is 0 and print 0 on my machine (which is the good index)... – Jérôme Richard Oct 19 '22 at 15:01
  • 1
    @lokp I have no idea how you got such a result. I guess the answer by joanis will clarify something for you. – dankal444 Oct 19 '22 at 15:05
  • 1
    @dankal444 careful here, you said `np.where(arr)` in your comment above, but that's going to return every index in `arr` which is non-zero/non-false, i.e., all indices (`(array([0, 1, 2], dtype=int64),)`. You need `np.where(arr == "string1")[0][0]`, not `np.where(arr)[0][0]` – joanis Oct 19 '22 at 15:07
  • @dankal444 yes you are right – lokp Oct 19 '22 at 15:10
  • @joanis yes, sure. I just wanted to write `np.where(arrayName)[0][0]` to show how to get the index instead of tuple. Should have choosen different name for input array - did not notice its the same in OP question. – dankal444 Oct 19 '22 at 15:11

1 Answers1

3

TL;DR

Use:

try:
    i = numpy.where(arr == "string1")[0][0]
except IndexError:
    # handle the case where "string1" was not found in arr

or

indices = list(numpy.where(arr == "string1")[0])

Details

Finding elements in NumPy arrays is not intuitive the first time you try to do it.

Let's decompose the operation:

>>> arr = numpy.array(["string1","string2","string3"])
>>> arr == "string1"
array([ True, False, False])

Notice how just doing arr == "string1" is already doing the search: it's returning an array of booleans of the same shape as arr telling us where the condition is true.

Then, you're using numpy.where which, when used with only one parameter (the condition), returns where its input is non-zero. With booleans, that means non false.

>>> numpy.where(numpy.array([ True, False, False]))
(array([0], dtype=int64),)
>>> numpy.where(arr == "string1")
(array([0], dtype=int64),)

It's not quite clear to my where this gives you a tuple of arrays for a 1-D input, but when you use this syntax with a 2-d input, it makes more sense.

In any case, what you're getting here is a tuple containing a list of indices where the condition matches. Notice it has to be a list, because you might have multiple matches.

For your code, you want numpy.where(arr == "string1")[0][0], because you know "string1" occurs in the list, but the inner list may also contain zero or more than one values, depending on how many times the string is found.

>>> arr2 = numpy.array(["string1","string2","string3","string1", "string3"])
>>> numpy.where(arr2 == "foo")
(array([], dtype=int64),)
>>> numpy.where(arr2 == "string3")
(array([2, 4], dtype=int64),)

So when you want to use these indices, you should simply treat numpy.where(arr == "string1")[0] as a list (it's really a 1-D array, though) and continue from there.

Now, just using numpy.where(arr == "some string")[0][0] is risky, because it will throw in IndexError exception if the string is not found in arr. If you really want to do that, do it in a try/except block.

If you need the list of indices as a Python list, you can do this:

indices = list(numpy.where(arr == "string1")[0])
joanis
  • 10,635
  • 14
  • 30
  • 40
  • Having the option to let np.where() return n indices (and setting n=1) would be incredibly useful. Now it looks ugly... – Jan M. Feb 23 '23 at 10:59