Apply Function on DataFrame Index

Question

What is the best way to apply a function over the index of a Pandas DataFrame? Currently I am using this verbose approach:

pd.DataFrame({"Month": df.reset_index().Date.apply(foo)})

where Date is the name of the index and foo is the name of the function that I am applying.

It "works", but it returns a numpy array rather than a Pandas Series. — Alex Rothberg, Nov 17 '13 at 01:00
what's your final goal? you can pass array to DataFrame constructor. Or do something like `pd.Series(df.index).apply(foo)` — Roman Pekar, Nov 17 '13 at 01:48
Following from @HYRY if you just want to modify the index of an existing DataFrame you can do `df.index = df.index.map(foo)` — Ben, Jul 14 '14 at 16:20

firelynx · Answer 1 · 2017-05-15T08:56:14.170

139

As already suggested by HYRY in the comments, Series.map is the way to go here. Just set the index to the resulting series.

Simple example:

df = pd.DataFrame({'d': [1, 2, 3]}, index=['FOO', 'BAR', 'BAZ'])
df
        d
FOO     1
BAR     2
BAZ     3

df.index = df.index.map(str.lower)
df
        d
foo     1
bar     2
baz     3

Index != Series

As pointed out by @OP. the df.index.map(str.lower) call returns a numpy array. This is because dataframe indices are based on numpy arrays, not Series.

The only way of making the index into a Series is to create a Series from it.

pd.Series(df.index.map(str.lower))

Caveat

The Index class now subclasses the StringAccessorMixin, which means that you can do the above operation as follows

df.index.str.lower()

This still produces an Index object, not a Series.

edited May 15 '17 at 08:56

answered Jun 02 '15 at 07:47

firelynx

30,616
9
91
101

1

With a multi-index, you can use slicing if you want to use both items in your function, e.g. `x[0]` and `x[1]`. – Elliott Nov 02 '16 at 16:13
3

A bit shorter way `df.index.map(str.lower)` – Zero Dec 31 '16 at 10:33
1

@JohnGalt Thanks for pointing it out. It's not only shorter, but faster, since str.lower is a compiled cython function and the lambda function I wrote is not. – firelynx Jan 01 '17 at 19:30
how does that modify if the function I want to apply needs some argument? e.g. I have a float index and I want to round each value to 2 decimal places – Luca Clissa May 03 '22 at 07:00

normanius · Answer 2 · 2021-03-21T17:37:42.170

27

You can convert an index using its to_series() method, and then either apply or map, according to your needs.

ret = df.index.map(foo)                # Returns pd.Index
ret = df.index.to_series().map(foo)    # Returns pd.Series
ret = df.index.to_series().apply(foo)  # Returns pd.Series

All of the above can be assigned directly to a new or existing column of df:

df["column"] = ret

Just for completeness: pd.Index.map, pd.Series.map and pd.Series.apply all operate element-wise. I often use map to apply lookups represented by dicts or pd.Series. apply is more generic because you can pass any function along with additional args or kwargs. The differences between apply and map are further discussed in this SO thread. I don't know why pd.Index.apply was omitted.

edited Mar 21 '21 at 17:37

answered Apr 29 '20 at 22:10

normanius

8,629
7
53
83

thank you for your detailed response, the third option got me out of a hole. – dimButTries Dec 13 '21 at 09:27
2

I found the 3rd example useful, as the index is preserved in the Series that is returned. – kristianp Aug 02 '22 at 23:58

score 13 · Answer 3 · answered Jul 13 '15 at 10:51

13

Assuming that you want to make a column in you're current DataFrame by applying your function "foo" to the index. You could write...

df['Month'] = df.index.map(foo)

To generate the series alone you could instead do ...

pd.Series({x: foo(x) for x in foo.index})

answered Jul 13 '15 at 10:51

suraj747

131
1
2

2

Using for loops in the pandas/numpy echo-system is highly discouraged. It is very memory inefficient and easily crashes with larger datasets. – firelynx Oct 26 '15 at 14:38

score 6 · Answer 4 · answered Nov 10 '15 at 21:46

A lot of answers are returning the Index as an array, which loses information about the index name etc (though you could do pd.Series(index.map(myfunc), name=index.name)). It also won't work for a MultiIndex.

The way that I worked with this is to use "rename":

mix = pd.MultiIndex.from_tuples([[1, 'hi'], [2, 'there'], [3, 'dude']], names=['num', 'name'])
data = np.random.randn(3)
df = pd.Series(data, index=mix)
print(df)
num  name 
1    hi       1.249914
2    there   -0.414358
3    dude     0.987852
dtype: float64

# Define a few dictionaries to denote the mapping
rename_dict = {i: i*100 for i in df.index.get_level_values('num')}
rename_dict.update({i: i+'_yeah!' for i in df.index.get_level_values('name')})
df = df.rename(index=rename_dict)
print(df)
num  name       
100  hi_yeah!       1.249914
200  there_yeah!   -0.414358
300  dude_yeah!     0.987852
dtype: float64

The only trick with this is that your index needs to have unique labels b/w different multiindex levels, but maybe someone more clever than me knows how to get around that. For my purposes this works 95% of the time.

`rename` has a `level` argument (nowadays?). So this removes the ambiguity: `df.rename(index=rename_dict0, level=0).rename(index=rename_dict1, level=1)`. — Antony Hatchkins, Jan 08 '23 at 20:32

Apply Function on DataFrame Index

4 Answers4

Index != Series

Caveat

Linked