0

I have two numpy array:

valid_year = np.array([1999, 2005, 2007, 2010])
full_year = np.array([1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010])

How do I find the indices of year that appear in valid year in the full range of year array? Without using any loop or search function.

The result should be like np.array([0,6,8,11])

mozway
  • 194,879
  • 13
  • 39
  • 75
DiaryWolf
  • 63
  • 5
  • duplicate of https://stackoverflow.com/questions/32191029/getting-the-indices-of-several-elements-in-a-numpy-array-at-once –  Oct 14 '21 at 06:17
  • No loop and no search? It sounds like you tied your hands badly... – MSH Oct 14 '21 at 06:19

3 Answers3

1

Assuming this input:

year = np.array([1999, 2005, 2007, 2010])
full_year = np.array([1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010])

You can use broadcasting comparison and nonzero:

(year == full_year[:,None]).any(1).nonzero()[0]

output:

array([ 0,  6,  8, 11])
mozway
  • 194,879
  • 13
  • 39
  • 75
1

We can assume that the full range of year consists of consecutive years. With that in mind, we can just subtract the valid years by the minimum year in the range and get the desired result.

>>> valid = np.array([1999, 2005, 2007])
>>> year_range = np.array([1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007])

Subtracting by the minimum year gives the result. You may also need np.where to make sure the indices do not exceed the length of the year range.

>>> valid - np.min(year_range)
array([ 0,  6,  8])

There you go, no loops or search functions. But to be honest, by subtracting the array and by using np.where, Numpy does perform loops behind the scenes. There is no actual way to avoid loops in this situation.

Troll
  • 1,895
  • 3
  • 15
  • 34
  • 2
    Very smart! +1 Regarding the use of loops, of course loops are everywhere, I guess only explicit loops are unwanted ;). NB. this wouldn't work however if a year in `valid` is not present in `year_range` – mozway Oct 14 '21 at 06:30
0

The np.isin() function could be used:

import numpy as np

valid_year = np.array([1999, 2005, 2007, 2010])
full_range_of_year = np.array(
    [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010]
)

boolean_mask = np.isin(full_range_of_year, valid_year)

print(boolean_mask)
# [ True False False False False False  True False  True False False  True]

print(np.nonzero(boolean_mask))
# (array([ 0,  6,  8, 11]),)
xdze2
  • 3,986
  • 2
  • 12
  • 29