92

let say I have a dataframe that looks like this:

df = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
 df
Out[92]: 
   A  B
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

Asumming that this dataframe already exist, how can I simply add a level 'C' to the column index so I get this:

 df
Out[92]: 
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

I saw SO anwser like this python/pandas: how to combine two dataframes into one with hierarchical column index? but this concat different dataframe instead of adding a column level to an already existing dataframe.

-

Community
  • 1
  • 1
Steven G
  • 16,244
  • 8
  • 53
  • 77

8 Answers8

139

As suggested by @StevenG himself, a better answer:

df.columns = pd.MultiIndex.from_product([df.columns, ['C']])

print(df)
#    A  B
#    C  C
# a  0  0
# b  1  1
# c  2  2
# d  3  3
# e  4  4
Romain
  • 19,910
  • 6
  • 56
  • 65
  • 2
    this is great, I like `pd.MultiIndex.from_product([df.columns, ['C']])` which is a bite more trivial since you don't have to keep track of the `len` of `df.columns`. you mind adding it to the answer so I can accept it? – Steven G Oct 24 '16 at 19:31
  • 1
    @StevenG great I did not know this trick. Thanks I have learned something new :-) – Romain Oct 24 '16 at 19:38
  • 24
    Do you have any tips, how to add another level, when the original df already has multiindex column names? I tried to add new level with from_product() method, however I received this error message: 'NotImplementedError: isnull is not defined for MultiIndex'. – Lenka Vraná Sep 15 '17 at 11:39
  • 6
    @LenkaVraná `pd.MultiIndex.from_product(df.columns.levels + [['C']])` – user3556757 Dec 27 '19 at 09:48
  • @user3556757 this unfortunately did not work for me (unhashable type 'index' or 'list') – ElectRocnic Jan 11 '20 at 12:08
  • EDIT: got it with `pd.MultiIndex.from_product([pd.Index(['C'])] + df.columns.levels)` (my order is inversed) (don't know what went wrong) – ElectRocnic Jan 11 '20 at 12:31
  • 4
    For anyone. I found casting the existing columns index to list before using it in MultiIndex.from_product works for 'isna not implemented'. `pd.MultiIndex.from_product([list(df.columns), ['C']])` – Max Jan 20 '20 at 11:15
  • Although you then have to flatten the indices. You could use `pd.concat([df], keys=[], names=[''],axis=1)` for the same result. – Max Jan 20 '20 at 11:41
  • Can this be done for One column only? like this for example# A B C # a 0 0 # b 1 1 # c 2 2 # d 3 3 # e 4 4 – Jenny Nov 02 '22 at 10:07
  • 1
    Fastest (or at least tied) of all the answers I tested, and easiest to read. – fantabolous Apr 24 '23 at 06:48
28

option 1
set_index and T

df.T.set_index(np.repeat('C', df.shape[1]), append=True).T

option 2
pd.concat, keys, and swaplevel

pd.concat([df], axis=1, keys=['C']).swaplevel(0, 1, 1)

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks did not know about swap and this is convinient. I tested it for a large dataframe to see if it was more efficient than setting `pd.MultiIndex.from_product([df.columns, ['C']])` and it was about 25% slower. – Steven G Oct 24 '16 at 19:33
  • No surprises! Romain's answer is quicker. I added this because I think it's valuable to know. – piRSquared Oct 24 '16 at 19:34
  • 13
    `pd.concat([df], axis=1, keys=['C'])` worked very well for multilevel columns – Justislav Bogevolnov Mar 05 '18 at 11:25
  • 1
    Option 2 should be the accepted answer for the general case when `df.columns` can be a `pd.MultiIndex`. – Josh Jun 13 '19 at 02:50
  • The `pd.concat` answer is great because it doesn't modify the original df. – BallpointBen Jul 25 '19 at 17:18
  • Always watch out with .T since it can cause some disruption to well-typed columns. In general .T-.T transformations are lossy. Using seaborn, take `df = sns.load_dataset("diamonds")` and compare `df.info()` and `df.T.T.info()`; all columns turn into object and memory usage grows five times! – creanion May 26 '22 at 07:58
17

A solution which adds a name to the new level and is easier on the eyes than other answers already presented:

df['newlevel'] = 'C'
df = df.set_index('newlevel', append=True).unstack('newlevel')

print(df)
#           A  B
# newlevel  C  C
# a         0  0
# b         1  1
# c         2  2
# d         3  3
# e         4  4
mbugert
  • 181
  • 1
  • 4
  • 7
    This is short and works also with columns that are already multi-level! As a one liner: `df.assign(newlevel='C').set_index('newlevel', append=True).unstack('newlevel')`. – Michele Piccolini Mar 08 '21 at 14:07
  • 2
    If the dataframe has very many rows, this has a per-row cost which is unnecessary – creanion May 26 '22 at 08:07
11

You could just assign the columns like:

>>> df.columns = [df.columns, ['C', 'C']]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>> 

Or for unknown length of columns:

>>> df.columns = [df.columns.get_level_values(0), np.repeat('C', df.shape[1])]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
9

Another way for MultiIndex (appanding 'E'):

df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[0], 'E', x[1]), df.columns))

   A  B
   E  E
   C  D
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
Anton Abrosimov
  • 349
  • 4
  • 6
4

I like it explicit (using MultiIndex) and chain-friendly (.set_axis):

df.set_axis(pd.MultiIndex.from_product([df.columns, ['C']]), axis=1)

This is particularly convenient when merging DataFrames with different column level numbers, where Pandas (1.4.2) raises a FutureWarning (FutureWarning: merging between different levels is deprecated and will be removed ... ):

import pandas as pd

df1 = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df2 = pd.DataFrame(index=list('abcde'), data=range(10, 15), columns=pd.MultiIndex.from_tuples([("C", "x")]))

# df1:
   A  B
a  0  0
b  1  1

# df2:
    C
    x
a  10
b  11

# merge while giving df1 another column level:
pd.merge(df1.set_axis(pd.MultiIndex.from_product([df1.columns, ['']]), axis=1),
         df2, 
         left_index=True, right_index=True)

# result:
   A  B   C
          x
a  0  0  10
b  1  1  11


mcsoini
  • 6,280
  • 2
  • 15
  • 38
0

Another method, but using a list comprehension of tuples as the arg to pandas.MultiIndex.from_tuples():

df.columns = pd.MultiIndex.from_tuples([(col, 'C') for col in df.columns])

df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
drT
  • 156
  • 1
  • 5
0

I have a dedicated function for this. It less elegant, but more flexible. The advantages:

  • automatically handles Index and MultiIndex
  • can assign name
  • can add multiple level at once
  • choose location (top or bottom)

Best regards.

def addLevel(index, value='', name=None, n=1, onTop=False):
    """Add extra dummy levels to index"""
    assert isinstance(index, (pd.MultiIndex, pd.Index))
    xar = np.array(index.tolist()).transpose()
    names = index.names if isinstance(index, pd.MultiIndex) else [index.name]
    addValues = np.full(shape=(n, xar.shape[-1]), fill_value=value)
    addName = [name] * n

    if onTop:
        names = addName + names
        xar = np.vstack([addValues, xar])
    else:
        names = names + addName
        xar = np.vstack([xar, addValues])

    return pd.MultiIndex.from_arrays(xar, names=names)
    
df = pd.DataFrame(index=list('abc'), data={'A': range(3), 'B': range(3)})
df.columns = addLevel(df.columns, value='C')
df.columns = addLevel(df.columns, value='D', name='D-name')
df.columns = addLevel(df.columns, value='E2', n=2)
df.columns = addLevel(df.columns, value='Top', name='OnTop', onTop=True)
df.columns = addLevel(df.columns, value=1, name='Number')
print(df)
## OnTop  Top   
##          A  B
##          C  C
## D-name   D  D
##         E2 E2
##         E2 E2
## Number   1  1
## a        0  0
## b        1  1
## c        2  2
Vyga
  • 894
  • 8
  • 8