Break dataframe header into multiheader

Question

Names	ABCBaseCIP00	ABCBaseCIP01	ABCBaseCIP02	ABC1CIP00	ABC1CIP01	ABC1CIP02	ABC2CIP00	ABC2CIP01	ABC2CIP02
X	1	2	3	4	5	6	7	8	9
Y	1	2	3	4	5	6	7	8	9
Z	1	2	3	4	5	6	7	8	9

I have above dataframe, I am looking to break column headers by name(ABCBase|ABC1|ABC2) and code(CIP00|CIP01|CIP02|CIP00|CIP01|CIP02|CIP00|CIP01|CIP02) to get below table as output.

Can anyone suggest how can that be done in pandas? This is dynamic data so do not want to hardcode anything.

	ABCBase	ABCBase	ABCBase	ABC1	ABC1	ABC1	ABC2	ABC2	ABC2
Names	CIP00	CIP01	CIP02	CIP00	CIP01	CIP02	CIP00	CIP01	CIP02
X	1	2	3	4	5	6	7	8	9
Y	1	2	3	4	5	6	7	8	9
Z	1	2	3	4	5	6	7	8	9

Does [this](https://stackoverflow.com/questions/32370402/giving-a-column-multiple-indexes-headers) answer your question? Or are you really looking for separate headers? — Thymen, Dec 28 '20 at 21:05

score 0 · Answer 1 · answered Dec 28 '20 at 22:27

import pandas as pd
data = { 'names' : ['x','y','z'],
         'ABCBaseCIP00' : [1,1,1],
         'ABCBaseCIP01' : [2,2,2],
         'ABCBaseCIP02' : [3,3,3],
         'ABC1CIP00' : [4,4,4],
         'ABC1CIP01' : [5,5,5]}
df = pd.DataFrame(data)

gives

    names   ABCBaseCIP00    ABCBaseCIP01    ABCBaseCIP02    ABC1CIP00   ABC1CIP01
0   x       1               2               3               4           5
1   y       1               2               3               4           5
2   z       1               2               3               4           5

Now do the work

df1 = df.T
df1.reset_index(inplace=True)
df1['name']=df1['index'].str[-5:]
df1['subname']=df1['index'].str[0:-5]

df1 = df1.drop('index',axis=1)
df1 = df1.T

which gives

            0       1       2       3       4       5
0           x       1       2       3       4       5
1           y       1       2       3       4       5
2           z       1       2       3       4       5
name        names   CIP00   CIP01   CIP02   CIP00   CIP01
subname     ABCBase ABCBase ABCBase ABC1    ABC1    ABC1

Which is not quite what you want but is it close enough?

score 0 · Accepted Answer · answered Dec 28 '20 at 22:47

Here's a way using string manipulation and pd.MultiIndex with from_arrays:

df = df.set_index('Names')

cols = df.columns.str.extract('(ABC(?:Base|\d))(.*)')
df.columns = pd.MultiIndex.from_arrays([cols[0], cols[1]], names=[None, None])

df

Output:

      ABCBase              ABC1              ABC2            
        CIP00 CIP01 CIP02 CIP00 CIP01 CIP02 CIP00 CIP01 CIP02
Names                                                        
X           1     2     3     4     5     6     7     8     9
Y           1     2     3     4     5     6     7     8     9
Z           1     2     3     4     5     6     7     8     9

Or,

df.columns = pd.MultiIndex\
               .from_arrays(zip(*df.columns.str.extract('(ABC(?:Base|\d))(.*)')\
               .to_numpy()))

score 0 · Answer 3 · answered Dec 29 '20 at 20:34

a one-line solution to this problem:

df.columns = df.columns.str.split('(CIP.+)', expand=True).droplevel(2)

full example:

from pandas import DataFrame, Index
df = DataFrame(
  { 'ABCBaseCIP00': [1,1,1],
    'ABCBaseCIP01': [2,2,2],
    'ABCBaseCIP02': [3,3,3],
    'ABC1CIP00': [4,4,4],
    'ABC1CIP01': [5,5,5] }, 
  index=Index(list('XYZ'), name='Names')
  )
df.columns = df.columns.str.split('(CIP.+)', expand=True).droplevel(2)
# df outputs:
      ABCBase              ABC1      
        CIP00 CIP01 CIP02 CIP00 CIP01
Names                                
X           1     2     3     4     5
Y           1     2     3     4     5
Z           1     2     3     4     5

how it works:

the regex CIP.+ matches the from start of level-2. The brackets () create a capture group so it is returned by .str.split
splitting and & expanding an index creates a multi-index
the resulting multi index has an extra level, which is dropped with .droplevel(2)

Break dataframe header into multiheader

3 Answers3