5

I have a dataframe like this :

df = pd.DataFrame(pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]],columns=["X_a","Y_b","X_b","Y_a"]))

   X_a  Y_b  X_b  Y_a
0    1    2    3    4
1    5    6    7    8
2    9   10   11   12

Now I basically what to create a dataframe with multilevel columns by splitting the columns based on _ (underscore) and grouping them based on the initial Alphabet in capital. An example of this transformation on the above dataframe can be this :

     X         Y
     a    b    a    b
0    1    3    4    2
1    5    7    8    6
2    9   11   12   10

I tried searching for a solution but the closest I got was this answer, which does not solve my problem exactly. So, is there any efficient or quicker way to do this in Pandas besides the brute force way of extracting the columns, then splitting them and finally arranging them together ? Any help would be appreciated.

Gambit1614
  • 8,547
  • 1
  • 25
  • 51

1 Answers1

11

In place

df.columns = df.columns.str.split('_', expand=True)
df.sort_index(axis=1)

   X       Y    
   a   b   a   b
0  1   3   4   2
1  5   7   8   6
2  9  11  12  10

Inline
Not altering the original

pd.DataFrame(
    df.values, columns=df.columns.str.split('_', expand=True)).sort_index(1)

   X       Y    
   a   b   a   b
0  1   3   4   2
1  5   7   8   6
2  9  11  12  10
piRSquared
  • 285,575
  • 57
  • 475
  • 624