Finding label location in a DataFrame Index

Question

I have a pandas dataframe:

import pandas as pnd
d = pnd.Timestamp('2013-01-01 16:00')
dates = pnd.bdate_range(start=d, end = d+pnd.DateOffset(days=10), normalize = False)

df = pnd.DataFrame(index=dates, columns=['a'])
df['a'] = 6

print(df)
                     a
2013-01-01 16:00:00  6
2013-01-02 16:00:00  6
2013-01-03 16:00:00  6
2013-01-04 16:00:00  6
2013-01-07 16:00:00  6
2013-01-08 16:00:00  6
2013-01-09 16:00:00  6
2013-01-10 16:00:00  6
2013-01-11 16:00:00  6

I am interested in find the label location of one of the labels, say,

ds = pnd.Timestamp('2013-01-02 16:00')

Looking at the index values, I know that is integer location of this label 1. How can get pandas to tell what the integer value of this label is?

As a little aside, the traditional alias for pandas is `pd` :) — Andy Hayden, Jun 21 '13 at 20:58
Came here because I had the opposite problem: Given an integer position in dataframe `df`, find the label at that position. After fiddling around, it turned out to be this: you can get the label at index position `n` by using `df.index[n]` — Cameron Yick, Jan 10 '16 at 00:46

score 49 · Accepted Answer · answered Jun 21 '13 at 20:49

49

You're looking for the index method get_loc:

In [11]: df.index.get_loc(ds)
Out[11]: 1

answered Jun 21 '13 at 20:49

Andy Hayden

359,921
101
625
535

@Jeff speaking of refresh refresh refresh: http://pandas.pydata.org/pandas-docs/dev/enhancingperf.html – Andy Hayden Jun 21 '13 at 21:07
1

How would this be extended to a multi-index? AttributeError: 'MultiIndex' object has no attribute 'getloc' – Anonymous Jul 23 '19 at 09:26
Do we need to subtract 1 from this value to index correctly? – Feb 18 '22 at 17:33

Eric Leschinski · Answer 2 · 2016-12-09T23:02:01.837

3

Get dataframe integer index given a date key:

>>> import pandas as pd

>>> df = pd.DataFrame(
    index=pd.date_range(pd.datetime(2008,1,1), pd.datetime(2008,1,5)),
    columns=("foo", "bar"))

>>> df["foo"] = [10,20,40,15,10]

>>> df["bar"] = [100,200,40,-50,-38]

>>> df
            foo  bar
2008-01-01   10  100
2008-01-02   20  200
2008-01-03   40   40
2008-01-04   15  -50
2008-01-05   10  -38

>>> df.index.get_loc(df["bar"].argmax())
1

>>> df.index.get_loc(df["foo"].argmax())
2

In column bar, the index of the maximum value is 1

In column foo, the index of the maximum value is 2

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_loc.html

edited Dec 09 '16 at 23:02

answered Dec 09 '16 at 22:54

Eric Leschinski

146,994
96
417
335

Please see http://meta.stackexchange.com/questions/19190/should-questions-include-tags-in-their-titles re your edit of the title. – Andy Hayden Dec 09 '16 at 23:08
Google ranks the title of the stackoverflow question heavily, it does sentiment analysis and lexigraphical parsing of the question title to optimize the best answer to present for a given query. Here is the edit I made. https://stackoverflow.com/posts/17244049/revisions OP put the tags at the end of the question, and I moved them to the start. My style for naming is usually "In tag and tag, How do I X all the Y's?" That way when you do a google search for "python pandas, how do I X all the Y's? The blue link text is all you need for 100% confirmation that this has what you need. – Eric Leschinski Jun 16 '18 at 12:57
You could make the argument that the tag set and the title text duplicate words, but the problem is the tag don't show as prominently. Ideally I think the best is to put tags in the title if they're at all useful, only to be removed (as you just did) if it does damage, which in this case it didn't do damage. Try not to go around and remove tags from title just because they've been duplicated in tag set, concise complete verbiage of the title is important to the search engine process. – Eric Leschinski Jun 16 '18 at 12:59

score 0 · Answer 3 · answered Oct 08 '18 at 09:22

get_loc can be used for rows and columns according to:

import pandas as pnd
d = pnd.Timestamp('2013-01-01 16:00')
dates = pnd.bdate_range(start=d, end = d+pnd.DateOffset(days=10), normalize = False)

df = pnd.DataFrame(index=dates)
df['a'] = 5
df['b'] = 6
print(df.head())    
                     a  b
2013-01-01 16:00:00  5  6
2013-01-02 16:00:00  5  6
2013-01-03 16:00:00  5  6
2013-01-04 16:00:00  5  6
2013-01-07 16:00:00  5  6

#for rows
print(df.index.get_loc('2013-01-01 16:00:00'))  
 0
#for columns
print(df.columns.get_loc('b'))
 1

score 0 · Answer 4 · answered Jun 30 '22 at 13:18

Because get_loc returns a mask rather than a list of integer index locations when there are multiple instances of the key in the index, I was toying with an answer using reset_index():

# Add a duplicate!!!
dup = pd.Timestamp('2013-01-07 16:00')
df = df.append(pd.DataFrame([7],columns=['a'],index=[dup]))
df

                    a
2013-01-01 16:00:00 6
2013-01-02 16:00:00 6
2013-01-03 16:00:00 6
2013-01-04 16:00:00 6
2013-01-07 16:00:00 6
2013-01-08 16:00:00 6
2013-01-09 16:00:00 6
2013-01-10 16:00:00 6
2013-01-11 16:00:00 6
2013-01-07 16:00:00 7
2013-01-08 16:00:00 3

# Only use this method if the key has duplicates
if (df.loc[dup].index.has_duplicates):
    df.reset_index().loc[df.index.get_loc(dup)].index.to_list()

array([4, 9])

BTW - I wanted to say THANKS to Andy Hayden whose earlier answer helped. — Rummble, Jun 30 '22 at 13:25

Finding label location in a DataFrame Index

4 Answers4

Linked

Related