5

How can I rename MultiIndex columns in pandas?

For example here is what I would like to be able to do:

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                  columns=pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)]))

df.rename(columns={('a', 1): 'd', ('a', 2): 'e', ('b', 1): 'f'}, errors='raise')

However, this has no effect, returning a DataFrame with the same column names as the original:

   a     b
   1  2  1
0  1  2  3
1  4  5  6
2  7  8  9

I would like to get:

   d  e  f
0  1  2  3
1  4  5  6
2  7  8  9

(I'm using errors='raise' to ensure I'm referencing the column names correctly.)

This works when you don't have a MultiIndex:

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                  columns=['a1', 'a2', 'b1'])

df.rename(columns={'a1': 'd', 'a2': 'e', 'b1': 'f'}, errors='raise')

Returns:

   d  e  f
0  1  2  3
1  4  5  6
2  7  8  9

I've messed around with variations on this, for example using the level argument and having the new names have the same number of levels but with no luck.

There are other related questions but they tend to focus on solving some larger problems. I can think of ways to get the result I need to solve my larger problem but I'm purposely not asking that here. This approach with rename seems the most natural and I would like to understand why it doesn't work or what I'm doing wrong. If there is an alternative to rename that I should be using that is relevant to my question, or pieces of information on rename proving it doesn't do what I think it should, they would be much appreciated.

The most similar question is here but the answers don't address my question: specifying the values of a single level is inadequate in my example as a single level cannot specify each column uniquely and set_levels can't address individual columns.

Fabio Veronese
  • 7,726
  • 2
  • 18
  • 27

2 Answers2

6

You could flatten the index first, using to_flat_index:

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                  columns=pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1)]))

lookup = {('a', 1): 'd', ('a', 2): 'e', ('b', 1): 'f'}

# flatten index
df.columns = df.columns.to_flat_index()

# rename using lookup
result = df.rename(columns=lookup)

print(result)

Output

   d  e  f
0  1  2  3
1  4  5  6
2  7  8  9
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • 1
    This is a nice approach, it hadn't clicked for me that the column name could be tuple. It avoids the string manipulation of other similar solutions that I've seen on other questions. It adds only one additional step (a call to `to_flat_index`) to what I was trying to do. I'm just going to wait and see if anyone can explain why rename doesn't work how I expected. – rich3948756410927 Oct 29 '19 at 12:17
1

I suggest use list comprehension with get with same default value x:

d = {('a', 1): 'd', ('a', 2): 'e', ('b', 1): 'f'}

df.columns = [d.get(x, x) for x in df.columns]
print (df)
   d  e  f
0  1  2  3
1  4  5  6
2  7  8  9

Because if test some function with rename, it processing each column name separately:

Test:

def f(x):
    print (x)

a
1
a
2
b
1

print (df.rename(columns = f))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for the `get` suggestion, I'm not sure which is more readable - this or the `to_flat_index` approach below. Is it safe assume that rename handling a function mapper the same way as a dict? I'm not sure it is as if this was the case we should get an error when using `errors='raise'`. – rich3948756410927 Oct 29 '19 at 12:40
  • @rich3948756410927 - `rename` with `MultiIndex` working if need only rename some scalars, only solution is flatten if need change both. – jezrael Oct 29 '19 at 12:42