59

I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this:

import pandas as pd, numpy as np

df=pd.DataFrame({'name':map(chr, range(97, 102)), 'id':range(10000,10005), 'value':np.random.randn(5)})
df.set_index('name', inplace=True)

Here's the result:

         id     value
name                 
a     10000  0.659710
b     10001  1.001821
c     10002 -0.197576
d     10003 -0.569181
e     10004 -0.882097

Now how am I allowed to go about accessing the name column?

print(df.index)  # No problem
print(df['name'])  # KeyError: u'name'

I know there are workaround like duplicating the column or changing the index to something else. But is there something cleaner, like some form of column access that treats the index the same way as everything else?

jpp
  • 159,742
  • 34
  • 281
  • 339
kuzzooroo
  • 6,788
  • 11
  • 46
  • 84

3 Answers3

30

Index has a special meaning in Pandas. It's used to optimise specific operations and can be used in various methods such as merging / joining data. Therefore, make a choice:

  • If it's "just another column", use reset_index and treat it as another column.
  • If it's genuinely used for indexing, keep it as an index and use df.index.

We can't make this choice for you. It should be dependent on the structure of your underlying data and on how you intend to analyse your data.

For more information on use of a dataframe index, see:

jpp
  • 159,742
  • 34
  • 281
  • 339
  • 9
    Say I have a library function that takes a DataFrame and creates a scatter plot based on it. It labels points in the plot based on the column of your choice, currently specified as a string. Now a use case has come up where it would be useful for the labels to be based on the index of a certain DataFrame. The index of this DataFrame is undoubtedly _special_, as you say. It's just in the context of this one function where it would be convenient to treat the index like a regular column, and I'm wondering if it can be done transparently. – kuzzooroo Sep 02 '18 at 19:53
  • @kuzzooroo, I suggest you ask [a separate question](https://stackoverflow.com/questions/ask) with a [mcve] of the problem you are facing. The example you gave in your question, for example, doesn't *show* any problem with using `df.index`. The methods available to `pd.Index` objects are different to those available to `pd.Series` objects so we need to *see* your code to determine the issue. – jpp Sep 02 '18 at 21:16
  • 7
    One wonders what the point of giving the index a name is, if it can't be used as such like other column names ... – guibar Jan 08 '21 at 17:14
  • 1
    @guibar, you can with [pd.DataFrame.query](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html) – jpp Jan 08 '21 at 18:28
  • 3
    This is not really a solution and your answer is out of scope. So what you're implying is that "no it's not possible because Pandas did not build an interface to interact the index like a column". If yes, then let this be the answer. Now a natural followup question is why not? We've seen other software being able to do this like SQL . – Keto May 28 '21 at 18:31
  • 1
    this is not an answer to the OP's question. it's a justification of why the problem exists in the first place... which is as a pandas user of more than a decade now, still confuses me why the API works this way. – william_grisaitis Feb 25 '22 at 19:56
  • @grisaitis, You have a couple of options: provide a better answer, or propose a Pandas code update to development team. – jpp Feb 26 '22 at 07:59
  • @jpp thanks. i've upvoted answers and github issues that i believe do that. – william_grisaitis Mar 08 '22 at 23:43
17

You could also use df.index.get_level_values if you need to access a (index) column by name. It also works with hierarchical indices (MultiIndex).

>>> df.index.get_level_values('name')
Index(['a', 'b', 'c', 'd', 'e'], dtype='object', name='name')
Jongwook Choi
  • 8,171
  • 3
  • 25
  • 22
  • Does this provide a form of access that is agnostic between the index and other named columns, or are you simply providing a different way to access an index that involves the index's name? – kuzzooroo Dec 19 '22 at 13:35
  • It's the latter -- it works only for the index columns. – Jongwook Choi Apr 28 '23 at 02:24
9

Instead of using reset_index, you could just copy the index to a normal column, do some work and then drop the column, for example:

df['tmp'] = df.index
# do stuff based on df['tmp']
del df['tmp']
Ian Ash
  • 1,087
  • 11
  • 23
  • 1
    I like this solution too, in the end column and index serve different purposes, just keep both if needed. – Peruz Feb 07 '20 at 04:14