90

Lets say I have a MultiIndex Series s:

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

and I want to apply a function which uses the index of the row:

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

How can I do s.apply(f) for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.

elyase
  • 39,479
  • 12
  • 112
  • 119
  • 4
    See this discussion, seems like x.name is what you are looking for http://stackoverflow.com/questions/26658240/getting-the-index-of-a-row-in-a-pandas-apply-function – Pablo Jadzinsky Dec 03 '15 at 17:13
  • @PabloJadzinsky That discussion is about DataFrame not for Series I think – vishalv2050 Apr 20 '20 at 07:33

7 Answers7

59

I don't believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what f does.

Dan Allan
  • 34,073
  • 6
  • 70
  • 63
  • 1
    Also worth noting that vectorising f, and using & | etc., may also be faster. – Andy Hayden Aug 19 '13 at 14:54
  • Currently I use the reset_index approach, will hold a little to see if someone proposes a cleaner solution. – elyase Aug 19 '13 at 14:54
  • 5
    +1 For getting rid of the `MultiIndex`. While these are occasionally useful, more and more I find myself turning my indices into columns. – Phillip Cloud Aug 19 '13 at 15:50
  • 1
    In my case (a dataframe, with axis=1), x.name() returns the value of the index when I apply a function lambda x: x ... – Christophe Sep 24 '15 at 15:30
  • Which is totally moronic behaviour but ye, what you say is completely right, however your solution is not ideal, for most use cases Jeff's answer `DataFrame(s).apply(x)` is much more straightforward and should be the accepted answer IMHO! – meow Mar 04 '19 at 11:14
  • This makes sense to me, because I want the object, not the index. But when I'm creating a function x(i) and then I'm printing i, it's printing out the index. Which I unstand from everything I'm reading it shouldn't be accessing the index at all. – Veggiet Jun 12 '20 at 17:56
19

Make it a frame, return scalars if you want (so the result is a series)

Setup

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

Printing function

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

Since you can return anything here, just return the scalars (access the index via the name attribute)

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • 1
    so when calling `apply` on `DataFrame` its index will be accessible through `name` of each series? I see this also is true for `DateTimeIndex` but it is a little weird to use something similar to `x.name == Time(2015-06-27 20:08:32.097333+00:00)` – dashesy Jun 28 '15 at 17:03
  • 4
    This should be the answer, adopting `x.name` is the cleanest and most flexible way of addressing the problem. – Thomas Kimber May 31 '17 at 10:20
14

Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value

s.to_frame(0).apply(f, axis=1)[0]
nehz
  • 2,172
  • 2
  • 23
  • 36
3

You may find it faster to use where rather than apply here:

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

Also you can use numpy-style logic/functions to any of the parts:

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 2
    Hm. Now I wonder if there should be a `Series.eval`/`query` method...I'll bring this up over at pandas. – Phillip Cloud Aug 19 '13 at 15:54
  • 1
    @PhillipCloud, +1, I need to use indices a lot(add/subs, aligns and missing data) and this would be great to have. – elyase Aug 19 '13 at 16:38
  • I'm finding increasingly more often that if I convert my `MultiIndex`es to columns I'm much happier and life is easier. There's *so* much more you can do with columns in a `DataFrame` than a `Series` with a `MultiIndex`, in fact they are essentially the same thing, except queries will be faster in the `DataFrame` columns than in the `Series`-with-`MultiIndex`. – Phillip Cloud Aug 19 '13 at 16:50
  • @PhillipCloud I'm the same, they should really be first class citizens (rather than the opposite). – Andy Hayden Aug 19 '13 at 17:13
  • This doesn't answer the question "Access index in pandas.Series.apply" – luca Nov 30 '17 at 15:37
0

You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

def f1(row):
    if row['I'] < 0.5:
        return 0
    else:
        return 1

def f2(row):
    if row['N1']==1:
        return 0
    else:
        return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)
0

Use reset_index() to convert the Series to a DataFrame and the index to a column, and then apply your function to the DataFrame.

The tricky part is knowing how reset_index() names the columns, so here are a couple of examples.

With a Singly Indexed Series

s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})

def use_index_and_value(row):
    return 'I made this with index {} and value {}'.format(row['index'], row[0])

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2

Output:

idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

With a Multi-Indexed Series

Same concept here, but you'll need to access the index values as row['level_*'] because that's where they're placed by Series.reset_index().

s=pd.Series({
    ('idx(0,0)', 'idx(0,1)'): 'val1',
    ('idx(1,0)', 'idx(1,1)'): 'val2'
})

def use_index_and_value(row):
    return 'made with index: {},{} & value: {}'.format(
        row['level_0'],
        row['level_1'],
        row[0]
    )

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# Replace auto index with the index from the original Series
s2.index = s.index
s2

Output:

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

If your series or indexes have names, you will need to adjust accordingly.

waterproof
  • 4,943
  • 5
  • 30
  • 28
0

Series implements the items() method, which enables the use of list comprehensions to map keys (i.e. index values) and values.

Given a series:

In[1]: seriesA = pd.Series([4, 2, 3, 7, 9], name="A")
In[2]: seriesA
Out[2]:
0    4
1    2
2    3
3    7
4    9
dtype: int64

Now, assume function f that takes a key and a value:

def f(key, value):
    return key + value

We can now create a new series by using a for comprehension:

In[1]: pd.Series(data=[f(k,v) for k, v in seriesA.items()], index=seriesA.index)
Out[1]:
0     4
1     3
2     5
3    10
4    13
dtype: int64

Of course this doesn't take advantage of any numpy performance goodness, but for some of operations it makes sense.

Felix Leipold
  • 1,064
  • 10
  • 17