Pandas: Multilevel column names

Question

pandas has support for multi-level column names:

>>>  x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance     first                    
foo              a         b         c
bar       0.102885  0.937838  0.907467

This feature is very useful since it allows multiple versions of the same dataframe to be appended 'horizontally' with the 1st level of the column names (in my example instance) distinguishing the instances.

Imagine I already have a dataframe like this:

                 a         b         c
bar       0.102885  0.937838  0.907467

Is there a nice way to add another level to the column names, similar to this for row index:

x['instance'] = 'first'
x.set_level('instance',append=True)

I don't *think* there is, but there definitely should be. I think there is a feature request for this on github... — Andy Hayden, Jan 29 '14 at 22:31
Although it raises some interesting questions, like "how do I select a particular column when there's two levels of column naming?". — LondonRob, Jan 29 '14 at 22:35
x['first'], x[(first', 'a') or x.xs('a', axis=1, level=1) ? :s — Andy Hayden, Jan 29 '14 at 22:39
this is going to be in 0.14, and will facilitate interesting mi selections: https://github.com/pydata/pandas/pull/6134 — Jeff, Jan 30 '14 at 02:17

score 57 · Answer 1 · answered Oct 13 '14 at 03:26

57

Try this:

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

columns=[('c','a'),('c','b')]

df.columns=pd.MultiIndex.from_tuples(columns)

answered Oct 13 '14 at 03:26

user3377361

571
1
6
3

3

Someone might like this answer by Romain [here](https://stackoverflow.com/a/40225796/8508004) for doing something similar to this but without the tuples since all the same level being added. – Wayne Jan 31 '18 at 20:40

Ian Zurutuza · Accepted Answer · 2021-01-28T17:56:08.727

21

No need to create a list of tuples

Use: pd.MultiIndex.from_product(iterables)

import pandas as pd
import numpy as np

df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])

Resultant DataFrame:

  new_label                    
          a         b         c
0   0.25999  0.337535  0.333568

Pull request from Jan 25, 2014

edited Jan 28 '21 at 17:56

answered May 18 '20 at 20:29

Ian Zurutuza

588
5
16

2

This looks better than the answer I selected 5+ years ago! I'm assuming it's new since then but I'm going to select it as the "chosen answer". If it's crazy wrong or bad, let me know in the comments here. – LondonRob May 20 '20 at 10:28
2

why do I get `AttributeError: module 'pandas' has no attribute 'Multiindex'` ..? – haneulkim Nov 16 '20 at 07:26
1

@Ambleu I got the same error, write MultiIndex instead of Multiindex (second I is uppercase). – Maria Jan 27 '21 at 10:12

score 19 · Answer 3 · answered Aug 31 '18 at 08:49

You can use concat. Give it a dictionary of dataframes where the key is the new column level you want to add.

In [46]: d = {}

In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31],
                                               [20, 0.34, 0.78, 0.34]]).set_index('idx')

In [48]: pd.concat(d, axis=1)
Out[48]:
    first_level
              a     b     c
idx
10         0.89  0.98  0.31
20         0.34  0.78  0.34

You can use the same technique to create multiple levels.

In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99],
                                                [20, 0.23, 0.26, 0.98]]).set_index('idx')

In [50]: pd.concat(d, axis=1)
Out[50]:
    first_level             second_level
              a     b     c            a     b     c
idx
10         0.89  0.98  0.31         0.29  0.63  0.99
20         0.34  0.78  0.34         0.23  0.26  0.98

score 7 · Answer 4 · answered Aug 06 '20 at 22:11

A lot of these solutions seem just a bit more complex than they need to be.

I prefer to make things look as simple and intuitive as possible when speed isn't absolutely necessary. I think this solution accomplishes that. Tested in versions of pandas as early as 0.22.0.

Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.

In [1]: import pandas as pd                                                                                                                                                                                          

In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])                                                                                                                                                              

In [3]: df                                                                                                                                                                                                           
Out[3]: 
   0  1  2  3
0  1  1  1  1
1  2  2  2  2

In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]                                                                                                                                                    

In [5]: df                                                                                                                                                                                                           
Out[5]: 
   a  c  e  g
   b  d  f  h
0  1  1  1  1
1  2  2  2  2

score 5 · Answer 5 · answered Jul 04 '21 at 06:39

x = [('G1','a'),("G1",'b'),("G2",'a'),('G2','b')]
y = [('K1','l'),("K1",'m'),("K2",'l'),('K2','m'),("K3",'l'),('K3','m')]
row_list = pd.MultiIndex.from_tuples(x)
col_list = pd.MultiIndex.from_tuples(y)

A = pd.DataFrame(np.random.randint(2,5,(4,6)), row_list,col_list)
A

This is the most simple and easy way to create MultiLevel columns and rows.

score 1 · Answer 6 · edited May 04 '20 at 06:52

Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.

def create_tuple_for_for_columns(df_a, multi_level_col):
    """
    Create a columns tuple that can be pandas MultiIndex to create multi level column

    :param df_a: pandas dataframe containing the columns that must form the first level of the multi index
    :param multi_level_col: name of second level column
    :return: tuple containing (second_level_col, firs_level_cols)
    """
    temp_columns = []
    for item in df_a.columns:
        temp_columns.append((multi_level_col, item))
    return temp_columns

It can be used like this:

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)

score 0 · Answer 7 · answered Jun 25 '23 at 16:01

Improving the pd.concat approach by Carl, how if we got only one row each iteration? This is not an optimized approach, but you can do something like this:

# initial
ds = []

# first iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99]]).set_index('idx')
ds.append(pd.concat(d, axis=1))

# display(ds[0])

# second iteration (can be inside function)
d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[20, 0.34, 0.78, 0.34]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[20, 0.23, 0.26, 0.98]]).set_index('idx')
ds.append(pd.concat(d, axis=1))

# display(ds[1])

# final concat
pd.concat(ds, axis=0)

Result:

	first_level			second_level
idx	a	b	c	a	b	c
10	0.89	0.98	0.31	0.29	0.63	0.99

	first_level			second_level
idx	a	b	c	a	b	c
20	0.34	0.78	0.34	0.23	0.26	0.98

	first_level			second_level
idx	a	b	c	a	b	c
10	0.89	0.98	0.31	0.29	0.63	0.99
20	0.34	0.78	0.34	0.23	0.26	0.98

Pandas: Multilevel column names

7 Answers7

Linked