235

I am trying to access the index of a row in a function applied across an entire DataFrame in Pandas. I have something like this:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df
   a  b  c
0  1  2  3
1  4  5  6

and I'll define a function that access elements with a given row

def rowFunc(row):
    return row['a'] + row['b'] * row['c']

I can apply it like so:

df['d'] = df.apply(rowFunc, axis=1)
>>> df
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

Awesome! Now what if I want to incorporate the index into my function? The index of any given row in this DataFrame before adding d would be Index([u'a', u'b', u'c', u'd'], dtype='object'), but I want the 0 and 1. So I can't just access row.index.

I know I could create a temporary column in the table where I store the index, but I'm wondering if it is stored in the row object somewhere.

Mike
  • 6,813
  • 4
  • 29
  • 50
  • 1
    Aside: is there a reason you need to use `apply`? It's much slower than performing vectorized ops on the frame itself. (Sometimes apply *is* the simplest way to do something, and performance considerations are often exaggerated, but for your particular example it's as easy *not* to use it.) – DSM Oct 30 '14 at 16:26
  • 6
    @DSM in actuality I am calling another objects constructor for each row using different row elements. I just wanted to put a minimal example together to illustrate the question. – Mike Oct 30 '14 at 17:27

3 Answers3

282

To access the index in this case you access the name attribute:

In [182]:

df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
def rowFunc(row):
    return row['a'] + row['b'] * row['c']

def rowIndex(row):
    return row.name
df['d'] = df.apply(rowFunc, axis=1)
df['rowIndex'] = df.apply(rowIndex, axis=1)
df
Out[182]:
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

Note that if this is really what you are trying to do that the following works and is much faster:

In [198]:

df['d'] = df['a'] + df['b'] * df['c']
df
Out[198]:
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

In [199]:

%timeit df['a'] + df['b'] * df['c']
%timeit df.apply(rowIndex, axis=1)
10000 loops, best of 3: 163 µs per loop
1000 loops, best of 3: 286 µs per loop

EDIT

Looking at this question 3+ years later, you could just do:

In[15]:
df['d'],df['rowIndex'] = df['a'] + df['b'] * df['c'], df.index
df

Out[15]: 
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

but assuming it isn't as trivial as this, whatever your rowFunc is really doing, you should look to use the vectorised functions, and then use them against the df index:

In[16]:
df['newCol'] = df['a'] + df['b'] + df['c'] + df.index
df

Out[16]: 
   a  b  c   d  rowIndex  newCol
0  1  2  3   7         0       6
1  4  5  6  34         1      16
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 7
    Would be nice if `name` would be a named tuple in case of a `Multindex`, so that a specific index level could be queried by its name. – Konstantin Jul 02 '20 at 10:33
61

Either:

1. with row.name inside the apply(..., axis=1) call:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'], index=['x','y'])

   a  b  c
x  1  2  3
y  4  5  6

df.apply(lambda row: row.name, axis=1)

x    x
y    y

2. with iterrows() (slower)

DataFrame.iterrows() allows you to iterate over rows, and access their index:

for idx, row in df.iterrows():
    ...
smci
  • 32,567
  • 20
  • 113
  • 146
  • 2
    and, if concerned, 'itertuples' generally performs far better: https://stackoverflow.com/questions/24870953/does-iterrows-have-performance-issues – dpb Apr 08 '19 at 17:51
13

To answer the original question: yes, you can access the index value of a row in apply(). It is available under the key name and requires that you specify axis=1 (because the lambda processes the columns of a row and not the rows of a column).

Working example (pandas 0.23.4):

>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df.set_index('a', inplace=True)
>>> df
   b  c
a      
1  2  3
4  5  6
>>> df['index_x10'] = df.apply(lambda row: 10*row.name, axis=1)
>>> df
   b  c  index_x10
a                 
1  2  3         10
4  5  6         40
Freek Wiekmeijer
  • 4,556
  • 30
  • 37