2
Names ABCBaseCIP00 ABCBaseCIP01 ABCBaseCIP02 ABC1CIP00 ABC1CIP01 ABC1CIP02 ABC2CIP00 ABC2CIP01 ABC2CIP02
X 1 2 3 4 5 6 7 8 9
Y 1 2 3 4 5 6 7 8 9
Z 1 2 3 4 5 6 7 8 9

I have above dataframe, I am looking to break column headers by name(ABCBase|ABC1|ABC2) and code(CIP00|CIP01|CIP02|CIP00|CIP01|CIP02|CIP00|CIP01|CIP02) to get below table as output.

Can anyone suggest how can that be done in pandas? This is dynamic data so do not want to hardcode anything.

ABCBase ABCBase ABCBase ABC1 ABC1 ABC1 ABC2 ABC2 ABC2
Names CIP00 CIP01 CIP02 CIP00 CIP01 CIP02 CIP00 CIP01 CIP02
X 1 2 3 4 5 6 7 8 9
Y 1 2 3 4 5 6 7 8 9
Z 1 2 3 4 5 6 7 8 9
user215865
  • 476
  • 4
  • 10
  • Does [this](https://stackoverflow.com/questions/32370402/giving-a-column-multiple-indexes-headers) answer your question? Or are you really looking for separate headers? – Thymen Dec 28 '20 at 21:05

3 Answers3

0
import pandas as pd
data = { 'names' : ['x','y','z'],
         'ABCBaseCIP00' : [1,1,1],
         'ABCBaseCIP01' : [2,2,2],
         'ABCBaseCIP02' : [3,3,3],
         'ABC1CIP00' : [4,4,4],
         'ABC1CIP01' : [5,5,5]}
df = pd.DataFrame(data)

gives

    names   ABCBaseCIP00    ABCBaseCIP01    ABCBaseCIP02    ABC1CIP00   ABC1CIP01
0   x       1               2               3               4           5
1   y       1               2               3               4           5
2   z       1               2               3               4           5

Now do the work

df1 = df.T
df1.reset_index(inplace=True)
df1['name']=df1['index'].str[-5:]
df1['subname']=df1['index'].str[0:-5]

df1 = df1.drop('index',axis=1)
df1 = df1.T

which gives

            0       1       2       3       4       5
0           x       1       2       3       4       5
1           y       1       2       3       4       5
2           z       1       2       3       4       5
name        names   CIP00   CIP01   CIP02   CIP00   CIP01
subname     ABCBase ABCBase ABCBase ABC1    ABC1    ABC1  

Which is not quite what you want but is it close enough?

Paul Brennan
  • 2,638
  • 4
  • 19
  • 26
0

Here's a way using string manipulation and pd.MultiIndex with from_arrays:

df = df.set_index('Names')

cols = df.columns.str.extract('(ABC(?:Base|\d))(.*)')
df.columns = pd.MultiIndex.from_arrays([cols[0], cols[1]], names=[None, None])

df

Output:

      ABCBase              ABC1              ABC2            
        CIP00 CIP01 CIP02 CIP00 CIP01 CIP02 CIP00 CIP01 CIP02
Names                                                        
X           1     2     3     4     5     6     7     8     9
Y           1     2     3     4     5     6     7     8     9
Z           1     2     3     4     5     6     7     8     9

Or,

df.columns = pd.MultiIndex\
               .from_arrays(zip(*df.columns.str.extract('(ABC(?:Base|\d))(.*)')\
               .to_numpy()))
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

a one-line solution to this problem:

df.columns = df.columns.str.split('(CIP.+)', expand=True).droplevel(2)

full example:

from pandas import DataFrame, Index
df = DataFrame(
  { 'ABCBaseCIP00': [1,1,1],
    'ABCBaseCIP01': [2,2,2],
    'ABCBaseCIP02': [3,3,3],
    'ABC1CIP00': [4,4,4],
    'ABC1CIP01': [5,5,5] }, 
  index=Index(list('XYZ'), name='Names')
  )
df.columns = df.columns.str.split('(CIP.+)', expand=True).droplevel(2)
# df outputs:
      ABCBase              ABC1      
        CIP00 CIP01 CIP02 CIP00 CIP01
Names                                
X           1     2     3     4     5
Y           1     2     3     4     5
Z           1     2     3     4     5

how it works:

  1. the regex CIP.+ matches the from start of level-2. The brackets () create a capture group so it is returned by .str.split
  2. splitting and & expanding an index creates a multi-index
  3. the resulting multi index has an extra level, which is dropped with .droplevel(2)
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85