Using iloc
with np.where
:
idx = next(iter(df['number'].iloc[np.where(df['color'].eq('blue'))]), -1) # 4
Note this also handles the case where the colour does not exist. In comparison, df['color'].eq('orange').idxmax()
gives 0
even though 'orange'
does not exist in the series. The above logic will give -1
.
numba
I'm wondering if there is any more optimal approach given that I only
ever need the first occurrence.
Yes! For a more efficient solution, see Efficiently return the index of the first value satisfying condition in array. Numba allows you to iterate row-wise efficiently. In this case, you will need to factorize your strings first so that you feed numeric arrays only to Numba:
from numba import njit
# factorize series, pd.factorize maintains order,
# i.e. first item in values gives 0 index
idx, values = pd.factorize(df['color'])
idx_search = np.where(values == 'blue')[0][0]
@njit
def get_first_index_nb(A, k):
for i in range(len(A)):
if A[i] == k:
return i
return -1
res = df['number'].iat[get_first_index_nb(idx, 1)] # 4
Of course, for a one-off calculation, this is inefficient. But for successive calculations, the solution will likely be a factor faster than solutions which check for equality across the entire series / array.