24

I have a pandas dataframe:

import pandas as pnd
d = pnd.Timestamp('2013-01-01 16:00')
dates = pnd.bdate_range(start=d, end = d+pnd.DateOffset(days=10), normalize = False)

df = pnd.DataFrame(index=dates, columns=['a'])
df['a'] = 6

print(df)
                     a
2013-01-01 16:00:00  6
2013-01-02 16:00:00  6
2013-01-03 16:00:00  6
2013-01-04 16:00:00  6
2013-01-07 16:00:00  6
2013-01-08 16:00:00  6
2013-01-09 16:00:00  6
2013-01-10 16:00:00  6
2013-01-11 16:00:00  6

I am interested in find the label location of one of the labels, say,

ds = pnd.Timestamp('2013-01-02 16:00')

Looking at the index values, I know that is integer location of this label 1. How can get pandas to tell what the integer value of this label is?

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
nitin
  • 7,234
  • 11
  • 39
  • 53
  • 8
    As a little aside, the traditional alias for pandas is `pd` :) – Andy Hayden Jun 21 '13 at 20:58
  • 2
    Came here because I had the opposite problem: Given an integer position in dataframe `df`, find the label at that position. After fiddling around, it turned out to be this: you can get the label at index position `n` by using `df.index[n]` – Cameron Yick Jan 10 '16 at 00:46

4 Answers4

49

You're looking for the index method get_loc:

In [11]: df.index.get_loc(ds)
Out[11]: 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
3

Get dataframe integer index given a date key:

>>> import pandas as pd

>>> df = pd.DataFrame(
    index=pd.date_range(pd.datetime(2008,1,1), pd.datetime(2008,1,5)),
    columns=("foo", "bar"))

>>> df["foo"] = [10,20,40,15,10]

>>> df["bar"] = [100,200,40,-50,-38]

>>> df
            foo  bar
2008-01-01   10  100
2008-01-02   20  200
2008-01-03   40   40
2008-01-04   15  -50
2008-01-05   10  -38

>>> df.index.get_loc(df["bar"].argmax())
1

>>> df.index.get_loc(df["foo"].argmax())
2

In column bar, the index of the maximum value is 1

In column foo, the index of the maximum value is 2

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_loc.html

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
  • Please see http://meta.stackexchange.com/questions/19190/should-questions-include-tags-in-their-titles re your edit of the title. – Andy Hayden Dec 09 '16 at 23:08
  • Google ranks the title of the stackoverflow question heavily, it does sentiment analysis and lexigraphical parsing of the question title to optimize the best answer to present for a given query. Here is the edit I made. https://stackoverflow.com/posts/17244049/revisions OP put the tags at the end of the question, and I moved them to the start. My style for naming is usually "In tag and tag, How do I X all the Y's?" That way when you do a google search for "python pandas, how do I X all the Y's? The blue link text is all you need for 100% confirmation that this has what you need. – Eric Leschinski Jun 16 '18 at 12:57
  • You could make the argument that the tag set and the title text duplicate words, but the problem is the tag don't show as prominently. Ideally I think the best is to put tags in the title if they're at all useful, only to be removed (as you just did) if it does damage, which in this case it didn't do damage. Try not to go around and remove tags from title just because they've been duplicated in tag set, concise complete verbiage of the title is important to the search engine process. – Eric Leschinski Jun 16 '18 at 12:59
0

get_loc can be used for rows and columns according to:

import pandas as pnd
d = pnd.Timestamp('2013-01-01 16:00')
dates = pnd.bdate_range(start=d, end = d+pnd.DateOffset(days=10), normalize = False)

df = pnd.DataFrame(index=dates)
df['a'] = 5
df['b'] = 6
print(df.head())    
                     a  b
2013-01-01 16:00:00  5  6
2013-01-02 16:00:00  5  6
2013-01-03 16:00:00  5  6
2013-01-04 16:00:00  5  6
2013-01-07 16:00:00  5  6

#for rows
print(df.index.get_loc('2013-01-01 16:00:00'))  
 0
#for columns
print(df.columns.get_loc('b'))
 1
Sadi
  • 23
  • 7
0

Because get_loc returns a mask rather than a list of integer index locations when there are multiple instances of the key in the index, I was toying with an answer using reset_index():

# Add a duplicate!!!
dup = pd.Timestamp('2013-01-07 16:00')
df = df.append(pd.DataFrame([7],columns=['a'],index=[dup]))
df

                    a
2013-01-01 16:00:00 6
2013-01-02 16:00:00 6
2013-01-03 16:00:00 6
2013-01-04 16:00:00 6
2013-01-07 16:00:00 6
2013-01-08 16:00:00 6
2013-01-09 16:00:00 6
2013-01-10 16:00:00 6
2013-01-11 16:00:00 6
2013-01-07 16:00:00 7
2013-01-08 16:00:00 3

# Only use this method if the key has duplicates
if (df.loc[dup].index.has_duplicates):
    df.reset_index().loc[df.index.get_loc(dup)].index.to_list()

array([4, 9])
Rummble
  • 1
  • 1