How to simply add a column level to a pandas dataframe

Question

let say I have a dataframe that looks like this:

df = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
 df
Out[92]: 
   A  B
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

Asumming that this dataframe already exist, how can I simply add a level 'C' to the column index so I get this:

 df
Out[92]: 
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

I saw SO anwser like this python/pandas: how to combine two dataframes into one with hierarchical column index? but this concat different dataframe instead of adding a column level to an already existing dataframe.

-

Romain · Accepted Answer · 2016-10-24T19:37:03.110

139

As suggested by @StevenG himself, a better answer:

df.columns = pd.MultiIndex.from_product([df.columns, ['C']])

print(df)
#    A  B
#    C  C
# a  0  0
# b  1  1
# c  2  2
# d  3  3
# e  4  4

edited Oct 24 '16 at 19:37

answered Oct 24 '16 at 19:12

Romain

19,910
6
56
65

2

this is great, I like `pd.MultiIndex.from_product([df.columns, ['C']])` which is a bite more trivial since you don't have to keep track of the `len` of `df.columns`. you mind adding it to the answer so I can accept it? – Steven G Oct 24 '16 at 19:31
1

@StevenG great I did not know this trick. Thanks I have learned something new :-) – Romain Oct 24 '16 at 19:38
24

Do you have any tips, how to add another level, when the original df already has multiindex column names? I tried to add new level with from_product() method, however I received this error message: 'NotImplementedError: isnull is not defined for MultiIndex'. – Lenka Vraná Sep 15 '17 at 11:39
6

@LenkaVraná `pd.MultiIndex.from_product(df.columns.levels + [['C']])` – user3556757 Dec 27 '19 at 09:48
@user3556757 this unfortunately did not work for me (unhashable type 'index' or 'list') – ElectRocnic Jan 11 '20 at 12:08
EDIT: got it with `pd.MultiIndex.from_product([pd.Index(['C'])] + df.columns.levels)` (my order is inversed) (don't know what went wrong) – ElectRocnic Jan 11 '20 at 12:31
4

For anyone. I found casting the existing columns index to list before using it in MultiIndex.from_product works for 'isna not implemented'. `pd.MultiIndex.from_product([list(df.columns), ['C']])` – Max Jan 20 '20 at 11:15
Although you then have to flatten the indices. You could use `pd.concat([df], keys=[], names=[''],axis=1)` for the same result. – Max Jan 20 '20 at 11:41
Can this be done for One column only? like this for example# A B C # a 0 0 # b 1 1 # c 2 2 # d 3 3 # e 4 4 – Jenny Nov 02 '22 at 10:07
1

Fastest (or at least tied) of all the answers I tested, and easiest to read. – fantabolous Apr 24 '23 at 06:48

score 28 · Answer 2 · answered Oct 24 '16 at 19:19

28

option 1
set_index and T

df.T.set_index(np.repeat('C', df.shape[1]), append=True).T

option 2
pd.concat, keys, and swaplevel

pd.concat([df], axis=1, keys=['C']).swaplevel(0, 1, 1)

answered Oct 24 '16 at 19:19

piRSquared

285,575
57
475
624

Thanks did not know about swap and this is convinient. I tested it for a large dataframe to see if it was more efficient than setting `pd.MultiIndex.from_product([df.columns, ['C']])` and it was about 25% slower. – Steven G Oct 24 '16 at 19:33
No surprises! Romain's answer is quicker. I added this because I think it's valuable to know. – piRSquared Oct 24 '16 at 19:34
13

`pd.concat([df], axis=1, keys=['C'])` worked very well for multilevel columns – Justislav Bogevolnov Mar 05 '18 at 11:25
1

Option 2 should be the accepted answer for the general case when `df.columns` can be a `pd.MultiIndex`. – Josh Jun 13 '19 at 02:50
The `pd.concat` answer is great because it doesn't modify the original df. – BallpointBen Jul 25 '19 at 17:18
Always watch out with .T since it can cause some disruption to well-typed columns. In general .T-.T transformations are lossy. Using seaborn, take `df = sns.load_dataset("diamonds")` and compare `df.info()` and `df.T.T.info()`; all columns turn into object and memory usage grows five times! – creanion May 26 '22 at 07:58

score 17 · Answer 3 · answered Sep 10 '20 at 12:48

17

A solution which adds a name to the new level and is easier on the eyes than other answers already presented:

df['newlevel'] = 'C'
df = df.set_index('newlevel', append=True).unstack('newlevel')

print(df)
#           A  B
# newlevel  C  C
# a         0  0
# b         1  1
# c         2  2
# d         3  3
# e         4  4

answered Sep 10 '20 at 12:48

mbugert

181
1
4

7

This is short and works also with columns that are already multi-level! As a one liner: `df.assign(newlevel='C').set_index('newlevel', append=True).unstack('newlevel')`. – Michele Piccolini Mar 08 '21 at 14:07
2

If the dataframe has very many rows, this has a per-row cost which is unnecessary – creanion May 26 '22 at 08:07

score 11 · Answer 4 · answered Sep 20 '21 at 08:16

11

You could just assign the columns like:

>>> df.columns = [df.columns, ['C', 'C']]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>>

Or for unknown length of columns:

>>> df.columns = [df.columns.get_level_values(0), np.repeat('C', df.shape[1])]
>>> df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4
>>>

answered Sep 20 '21 at 08:16

U13-Forward

69,221
14
89
114

1

This is the way, when you want to be flexible and assign any list as the new level – spettekaka Jul 19 '22 at 07:55

score 9 · Answer 5 · edited May 24 '20 at 14:08

9

Another way for MultiIndex (appanding 'E'):

df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[0], 'E', x[1]), df.columns))

   A  B
   E  E
   C  D
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

edited May 24 '20 at 14:08

Itamar Mushkin

2,803
2
16
32

answered Nov 21 '19 at 08:10

Anton Abrosimov

349
4
6

6

A shorter version: `df.columns = pd.MultiIndex.from_tuples([(c[0], 'E', c[1]) for c in df.columns])` – Itamar Mushkin May 24 '20 at 14:14

mcsoini · Answer 6 · 2022-05-26T07:55:47.860

I like it explicit (using MultiIndex) and chain-friendly (.set_axis):

df.set_axis(pd.MultiIndex.from_product([df.columns, ['C']]), axis=1)

This is particularly convenient when merging DataFrames with different column level numbers, where Pandas (1.4.2) raises a FutureWarning (FutureWarning: merging between different levels is deprecated and will be removed ... ):

import pandas as pd

df1 = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df2 = pd.DataFrame(index=list('abcde'), data=range(10, 15), columns=pd.MultiIndex.from_tuples([("C", "x")]))

# df1:
   A  B
a  0  0
b  1  1

# df2:
    C
    x
a  10
b  11

# merge while giving df1 another column level:
pd.merge(df1.set_axis(pd.MultiIndex.from_product([df1.columns, ['']]), axis=1),
         df2, 
         left_index=True, right_index=True)

# result:
   A  B   C
          x
a  0  0  10
b  1  1  11

score 0 · Answer 7 · answered Dec 05 '22 at 20:29

0

Another method, but using a list comprehension of tuples as the arg to pandas.MultiIndex.from_tuples():

df.columns = pd.MultiIndex.from_tuples([(col, 'C') for col in df.columns])

df
   A  B
   C  C
a  0  0
b  1  1
c  2  2
d  3  3
e  4  4

answered Dec 05 '22 at 20:29

drT

156
1
5

score 0 · Answer 8 · answered Jun 08 '23 at 07:42

I have a dedicated function for this. It less elegant, but more flexible. The advantages:

automatically handles Index and MultiIndex
can assign name
can add multiple level at once
choose location (top or bottom)

Best regards.

def addLevel(index, value='', name=None, n=1, onTop=False):
    """Add extra dummy levels to index"""
    assert isinstance(index, (pd.MultiIndex, pd.Index))
    xar = np.array(index.tolist()).transpose()
    names = index.names if isinstance(index, pd.MultiIndex) else [index.name]
    addValues = np.full(shape=(n, xar.shape[-1]), fill_value=value)
    addName = [name] * n

    if onTop:
        names = addName + names
        xar = np.vstack([addValues, xar])
    else:
        names = names + addName
        xar = np.vstack([xar, addValues])

    return pd.MultiIndex.from_arrays(xar, names=names)
    
df = pd.DataFrame(index=list('abc'), data={'A': range(3), 'B': range(3)})
df.columns = addLevel(df.columns, value='C')
df.columns = addLevel(df.columns, value='D', name='D-name')
df.columns = addLevel(df.columns, value='E2', n=2)
df.columns = addLevel(df.columns, value='Top', name='OnTop', onTop=True)
df.columns = addLevel(df.columns, value=1, name='Number')
print(df)
## OnTop  Top   
##          A  B
##          C  C
## D-name   D  D
##         E2 E2
##         E2 E2
## Number   1  1
## a        0  0
## b        1  1
## c        2  2

How to simply add a column level to a pandas dataframe

8 Answers8

Linked

Related