Find element's index in pandas Series

Question

I know this is a very basic question but for some reason I can't find an answer. How can I get the index of certain element of a Series in python pandas? (first occurrence would suffice)

I.e., I'd like something like:

import pandas as pd
myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
print myseries.find(7) # should output 3

Certainly, it is possible to define such a method with a loop:

def find(s, el):
    for i in s.index:
        if s[i] == el: 
            return i
    return None

print find(myseries, 7)

but I assume there should be a better way. Is there?

score 291 · Accepted Answer · edited Nov 07 '16 at 14:03

291

>>> myseries[myseries == 7]
3    7
dtype: int64
>>> myseries[myseries == 7].index[0]
3

Though I admit that there should be a better way to do that, but this at least avoids iterating and looping through the object and moves it to the C level.

edited Nov 07 '16 at 14:03

Jonathan Eunice

21,653
6
75
77

answered Aug 20 '13 at 05:52

Viktor Kerkez

45,070
12
104
85

16

The trouble here is it assumes the element being searched for is actually in the list. It's a bummer pandas doesn't seem to have a built in find operation. – jxramos Aug 23 '17 at 17:16
12

This solution only works if your series has a sequential integer index. If your series index is by datetime, this doesn't work. – Andrew Medlin Jul 07 '18 at 11:45

Jeff · Answer 2 · 2013-08-20T12:19:16.907

Converting to an Index, you can use get_loc

In [1]: myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])

In [3]: Index(myseries).get_loc(7)
Out[3]: 3

In [4]: Index(myseries).get_loc(10)
KeyError: 10

Duplicate handling

In [5]: Index([1,1,2,2,3,4]).get_loc(2)
Out[5]: slice(2, 4, None)

Will return a boolean array if non-contiguous returns

In [6]: Index([1,1,2,1,3,2,4]).get_loc(2)
Out[6]: array([False, False,  True, False, False,  True, False], dtype=bool)

Uses a hashtable internally, so fast

In [7]: s = Series(randint(0,10,10000))

In [9]: %timeit s[s == 5]
1000 loops, best of 3: 203 µs per loop

In [12]: i = Index(s)

In [13]: %timeit i.get_loc(5)
1000 loops, best of 3: 226 µs per loop

As Viktor points out, there is a one-time creation overhead to creating an index (its incurred when you actually DO something with the index, e.g. the is_unique)

In [2]: s = Series(randint(0,10,10000))

In [3]: %timeit Index(s)
100000 loops, best of 3: 9.6 µs per loop

In [4]: %timeit Index(s).is_unique
10000 loops, best of 3: 140 µs per loop

@Jeff if you have a more interesting index it not quite so easy... but I guess you can just do `s.index[_]` — Andy Hayden, Aug 20 '13 at 15:21

Bill · Answer 3 · 2022-02-19T05:24:30.173

I'm impressed with all the answers here. This is not a new answer, just an attempt to summarize the timings of all these methods. I considered the case of a series with 25 elements and assumed the general case where the index could contain any values and you want the index value corresponding to the search value which is towards the end of the series.

Here are the speed tests on a 2012 Mac Mini in Python 3.9.10 with Pandas version 1.4.0.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: data = [406400, 203200, 101600, 76100, 50800, 25400, 19050, 12700, 950
   ...: 0, 6700, 4750, 3350, 2360, 1700, 1180, 850, 600, 425, 300, 212, 150, 1
   ...: 06, 75, 53, 38]

In [4]: myseries = pd.Series(data, index=range(1,26))

In [5]: assert(myseries[21] == 150)

In [6]: %timeit myseries[myseries == 150].index[0]
179 µs ± 891 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit myseries[myseries == 150].first_valid_index()
205 µs ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]: %timeit myseries.where(myseries == 150).first_valid_index()
597 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit myseries.index[np.where(myseries == 150)[0][0]]
110 µs ± 872 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: %timeit pd.Series(myseries.index, index=myseries)[150]
125 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: %timeit myseries.index[pd.Index(myseries).get_loc(150)]
49.5 µs ± 814 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit myseries.index[list(myseries).index(150)]
7.75 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit myseries.index[myseries.tolist().index(150)]
2.55 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit dict(zip(myseries.values, myseries.index))[150]
9.89 µs ± 79.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [15]: %timeit {v: k for k, v in myseries.items()}[150]
9.99 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@Jeff's answer seems to be the fastest - although it doesn't handle duplicates.

Correction: Sorry, I missed one, @Alex Spangher's solution using the list index method is by far the fastest.

Update: Added @EliadL's answer.

Hope this helps.

Amazing that such a simple operation requires such convoluted solutions and many are so slow. Over half a millisecond in some cases to find a value in a series of 25.

2022-02-18 Update

Updated all the timings with the latest Pandas version and Python 3.9. Even on an older computer, all the timings have significantly reduced (10 to 70%) compared to the previous tests (version 0.25.3).

Plus: Added two more methods utilizing dictionaries.

Thanks. But shouldn't you be measuring *after* `myindex` is created, since it only needs to be created once? — EliadL, Jan 01 '20 at 23:24
You could argue that but it depends on how many look-ups like this are required. It's only worth creating the `myindex` series if you are going to do the look-up many times. For this test I assumed it was only needed once and the total execution time was important. — Bill, Jan 02 '20 at 21:57
Just ran into the need to this this tonight, and using .get_lock() on the same Index object across multiple lookups seems like it should be the fastest. I think an improvement to the answer would be to provide the timings for both: including the Index creation, and another timing of only the lookup after it has been created. — Rick, May 14 '20 at 02:28
Yes, good point. @EliadL also said that. It depends in how many applications the series is static. If any values in the series change, you need to rebuild `pd.Index(myseries)`. To be fair to the other methods I assumed the original series might have changed since the last lookup. — Bill, May 14 '20 at 17:06

Alon · Answer 4 · 2015-04-08T08:51:51.380

15

In [92]: (myseries==7).argmax()
Out[92]: 3

This works if you know 7 is there in advance. You can check this with (myseries==7).any()

Another approach (very similar to the first answer) that also accounts for multiple 7's (or none) is

In [122]: myseries = pd.Series([1,7,0,7,5], index=['a','b','c','d','e'])
In [123]: list(myseries[myseries==7].index)
Out[123]: ['b', 'd']

edited Apr 08 '15 at 08:51

answered Apr 08 '15 at 08:12

Alon

761
6
7

The point about knowing 7 is an element in advance is right on. However using an `any` check is not ideal since a double iteration is needed. There's a cool post op check that will unveil all `False` conditions you can see [here](https://stackoverflow.com/a/45846361/1330381). – jxramos Aug 23 '17 at 18:07
2

Careful, if no element matches this condition, `argmax` will still return 0 (instead of erroring out). – cs95 Jan 23 '19 at 21:29

score 11 · Answer 5 · answered Sep 17 '14 at 20:09

Another way to do this, although equally unsatisfying is:

s = pd.Series([1,3,0,7,5],index=[0,1,2,3,4])

list(s).index(7)

returns: 3

On time tests using a current dataset I'm working with (consider it random):

[64]:    %timeit pd.Index(article_reference_df.asset_id).get_loc('100000003003614')
10000 loops, best of 3: 60.1 µs per loop

In [66]: %timeit article_reference_df.asset_id[article_reference_df.asset_id == '100000003003614'].index[0]
1000 loops, best of 3: 255 µs per loop


In [65]: %timeit list(article_reference_df.asset_id).index('100000003003614')
100000 loops, best of 3: 14.5 µs per loop

score 7 · Answer 6 · answered Sep 05 '16 at 00:01

7

If you use numpy, you can get an array of the indecies that your value is found:

import numpy as np
import pandas as pd
myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
np.where(myseries == 7)

This returns a one element tuple containing an array of the indecies where 7 is the value in myseries:

(array([3], dtype=int64),)

answered Sep 05 '16 at 00:01

Alex

2,154
3
26
49

This is the best solution that I found. – Hadi Rohani Jun 03 '21 at 19:47
If using a dataframe, you can also use .values in combination with np.where / np.argwhere. To find the indices of all non-zero elements, it would be: np.argwhere(df['Column'].values) – Evan W. Apr 24 '22 at 16:13

score 5 · Answer 7 · answered Mar 25 '17 at 05:15

5

you can use Series.idxmax()

>>> import pandas as pd
>>> myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
>>> myseries.idxmax()
3
>>>

answered Mar 25 '17 at 05:15

Raki Gade

67
1
1

6

This appears to only return the index where the max element is found, not a specific `index of certain element` like the question asked. – jxramos May 30 '17 at 19:58

EliadL · Answer 8 · 2020-01-02T11:17:12.037

4

This is the most native and scalable approach I could find:

>>> myindex = pd.Series(myseries.index, index=myseries)

>>> myindex[7]
3

>>> myindex[[7, 5, 7]]
7    3
5    4
7    3
dtype: int64

edited Jan 02 '20 at 11:17

answered Jan 01 '20 at 13:09

EliadL

6,230
2
26
43

score 2 · Answer 9 · answered Oct 29 '19 at 22:02

2

Another way to do it that hasn't been mentioned yet is the tolist method:

myseries.tolist().index(7)

should return the correct index, assuming the value exists in the Series.

answered Oct 29 '19 at 22:02

rmutalik

1,925
3
16
20

1

@Alex Spangher suggested something similar on Sep 17 '14. See his answer. I have now added both versions to the test results. – Bill Jan 01 '20 at 19:59

score 1 · Answer 10 · answered Aug 21 '18 at 09:49

1

Often your value occurs at multiple indices:

>>> myseries = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1])
>>> myseries.index[myseries == 1]
Int64Index([3, 4, 5, 6, 10, 11], dtype='int64')

answered Aug 21 '18 at 09:49

Ulf Aslak

7,876
4
34
56

score 1 · Answer 11 · answered Jul 10 '21 at 06:23

The Pandas has builtin class Index with a function called get_loc. This function will either return

index (element index)
slice (if the specified number is in sequence)
array (bool array if the number is at multiple indexes)

Example:

import pandas as pd

>>> mySer = pd.Series([1, 3, 8, 10, 13])
>>> pd.Index(mySer).get_loc(10)  # Returns index
3  # Index of 10 in series

>>> mySer = pd.Series([1, 3, 8, 10, 10, 10, 13])
>>> pd.Index(mySer).get_loc(10)  # Returns slice
slice(3, 6, None)  # 10 occurs at index 3 (included) to 6 (not included)


# If the data is not in sequence then it would return an array of bool's.
>>> mySer = pd.Series([1, 10, 3, 8, 10, 10, 10, 13, 10])
>>> pd.Index(mySer).get_loc(10)
array([False, True, False, False, True, True, False, True])

There are many other options too but I found it very simple for me.

score 0 · Answer 12 · answered Mar 09 '22 at 09:49

0

df.index method will help you to find the exact row number

my_fl2=(df['ConvertedCompYearly'] == 45241312 )
print (df[my_fl2].index)

   
Name: ConvertedCompYearly, dtype: float64
Int64Index([66910], dtype='int64')

answered Mar 09 '22 at 09:49

salim ep

35
5

Find element's index in pandas Series

12 Answers12

Linked

Related