Select columns in pandas dataframe and group them with multiindex

Question

I have a huge dataframe with 126 columns. I want to add an extra level (multiindex) where i would like to have 5 categories, so each of 126 columns would fall under the corresponding category. I have found many solutions with defining levels and writing down all columns you want to append to that level, which is really time consuming since i have to group 126 columns. Is there a faster way to do this? For example with slicing columns like .iloc[:,9:44], since i want to group those 35 columns to one category?

Dataframe looks like this:

    df
        a    b     c...  d    e     f...  g    h    i...  j    k    l... n=126

 1     1.0  1.0   1.0   2.0   3.0   2.0   1.0  1.0  1.0  2.0   3.0   2.0 
 2     4.0  5.0   4.0   4.0   8.0   4.0   4.0  5.0  4.0  4.0   8.0   4.0
 3     6.0  1.0   6.0   7.0   8.0   7.0   6.0  1.0  6.0  7.0   8.0   7.0

The solution would look like this:

    df2
              A          |        B         |       C          |       D    n=5
        a    b     c...  |  d     e    f... |  g    h     i... |   j   k  l n=126 

1      1.0  1.0   1.0    2.0  3.0   2.0    1.0  1.0   1.0    2.0  3.0   2.0
2      4.0  5.0   4.0    4.0  8.0   4.0    4.0  5.0   4.0    4.0  8.0   4.0
3      6.0  1.0   6.0    7.0  8.0   7.0    6.0  1.0   6.0    7.0  8.0   7.0

a,b,c,d etc. represent name of the biomarker while A, B, C, D would represent the group of biomarkers. I want to add group level for better order. — KayEss, Apr 30 '20 at 09:46

jezrael · Accepted Answer · 2020-04-30T10:03:34.870

If want assign each N values to separate category create dictionary for each N chunks and then map:

#https://stackoverflow.com/a/312464/2901002
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

L = ['A','B','C','D']
d = {v: k for k, x in zip(L, chunks(df.columns, 3)) for v in x}
print (d)
{'a': 'A', 'b': 'A', 'c': 'A', 
 'd': 'B', 'e': 'B', 'f': 'B', 
 'g': 'C', 'h': 'C', 'i': 'C', 
 'j': 'D', 'k': 'D', 'l': 'D'}

df.columns = [df.columns.map(d), df.columns]
print (df)

     A              B              C              D          
     a    b    c    d    e    f    g    h    i    j    k    l
1  1.0  1.0  1.0  2.0  3.0  2.0  1.0  1.0  1.0  2.0  3.0  2.0
2  4.0  5.0  4.0  4.0  8.0  4.0  4.0  5.0  4.0  4.0  8.0  4.0
3  6.0  1.0  6.0  7.0  8.0  7.0  6.0  1.0  6.0  7.0  8.0  7.0

EDIT: If need set columns by positions:

d1 = {'A':df.columns[0:3],
      'B':df.columns[3:6],
      'C':df.columns[6:9],
      'D':df.columns[9:12]}
print (d1)
{'A': Index(['a', 'b', 'c'], dtype='object'), 
 'B': Index(['d', 'e', 'f'], dtype='object'), 
 'C': Index(['g', 'h', 'i'], dtype='object'), 
 'D': Index(['j', 'k', 'l'], dtype='object')}

d =  {v: k for k, x in d1.items() for v in x}
print (d)
{'a': 'A', 'b': 'A', 'c': 'A', 
 'd': 'B', 'e': 'B', 'f': 'B', 
 'g': 'C', 'h': 'C', 'i': 'C', 
 'j': 'D', 'k': 'D', 'l': 'D'}

df.columns = [df.columns.map(d), df.columns]
print (df)
     A              B              C              D          
     a    b    c    d    e    f    g    h    i    j    k    l
1  1.0  1.0  1.0  2.0  3.0  2.0  1.0  1.0  1.0  2.0  3.0  2.0
2  4.0  5.0  4.0  4.0  8.0  4.0  4.0  5.0  4.0  4.0  8.0  4.0
3  6.0  1.0  6.0  7.0  8.0  7.0  6.0  1.0  6.0  7.0  8.0  7.0

this works, thank you. But some columns are assigned to wrong groups in my dataset. Is it possible to define exactly which columns fall under which category without writing their names down? For example df.iloc[:,0:8] = group1, df.iloc[:,9:44] = group2 etc.? — KayEss, Apr 30 '20 at 09:58

Select columns in pandas dataframe and group them with multiindex

1 Answers1