1

I have a huge dataframe with 126 columns. I want to add an extra level (multiindex) where i would like to have 5 categories, so each of 126 columns would fall under the corresponding category. I have found many solutions with defining levels and writing down all columns you want to append to that level, which is really time consuming since i have to group 126 columns. Is there a faster way to do this? For example with slicing columns like .iloc[:,9:44], since i want to group those 35 columns to one category?

Dataframe looks like this:

    df
        a    b     c...  d    e     f...  g    h    i...  j    k    l... n=126

 1     1.0  1.0   1.0   2.0   3.0   2.0   1.0  1.0  1.0  2.0   3.0   2.0 
 2     4.0  5.0   4.0   4.0   8.0   4.0   4.0  5.0  4.0  4.0   8.0   4.0
 3     6.0  1.0   6.0   7.0   8.0   7.0   6.0  1.0  6.0  7.0   8.0   7.0

The solution would look like this:

    df2
              A          |        B         |       C          |       D    n=5
        a    b     c...  |  d     e    f... |  g    h     i... |   j   k  l n=126 

1      1.0  1.0   1.0    2.0  3.0   2.0    1.0  1.0   1.0    2.0  3.0   2.0
2      4.0  5.0   4.0    4.0  8.0   4.0    4.0  5.0   4.0    4.0  8.0   4.0
3      6.0  1.0   6.0    7.0  8.0   7.0    6.0  1.0   6.0    7.0  8.0   7.0
KayEss
  • 419
  • 4
  • 18

1 Answers1

1

If want assign each N values to separate category create dictionary for each N chunks and then map:

#https://stackoverflow.com/a/312464/2901002
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

L = ['A','B','C','D']
d = {v: k for k, x in zip(L, chunks(df.columns, 3)) for v in x}
print (d)
{'a': 'A', 'b': 'A', 'c': 'A', 
 'd': 'B', 'e': 'B', 'f': 'B', 
 'g': 'C', 'h': 'C', 'i': 'C', 
 'j': 'D', 'k': 'D', 'l': 'D'}

df.columns = [df.columns.map(d), df.columns]
print (df)

     A              B              C              D          
     a    b    c    d    e    f    g    h    i    j    k    l
1  1.0  1.0  1.0  2.0  3.0  2.0  1.0  1.0  1.0  2.0  3.0  2.0
2  4.0  5.0  4.0  4.0  8.0  4.0  4.0  5.0  4.0  4.0  8.0  4.0
3  6.0  1.0  6.0  7.0  8.0  7.0  6.0  1.0  6.0  7.0  8.0  7.0

EDIT: If need set columns by positions:

d1 = {'A':df.columns[0:3],
      'B':df.columns[3:6],
      'C':df.columns[6:9],
      'D':df.columns[9:12]}
print (d1)
{'A': Index(['a', 'b', 'c'], dtype='object'), 
 'B': Index(['d', 'e', 'f'], dtype='object'), 
 'C': Index(['g', 'h', 'i'], dtype='object'), 
 'D': Index(['j', 'k', 'l'], dtype='object')}

d =  {v: k for k, x in d1.items() for v in x}
print (d)
{'a': 'A', 'b': 'A', 'c': 'A', 
 'd': 'B', 'e': 'B', 'f': 'B', 
 'g': 'C', 'h': 'C', 'i': 'C', 
 'j': 'D', 'k': 'D', 'l': 'D'}

df.columns = [df.columns.map(d), df.columns]
print (df)
     A              B              C              D          
     a    b    c    d    e    f    g    h    i    j    k    l
1  1.0  1.0  1.0  2.0  3.0  2.0  1.0  1.0  1.0  2.0  3.0  2.0
2  4.0  5.0  4.0  4.0  8.0  4.0  4.0  5.0  4.0  4.0  8.0  4.0
3  6.0  1.0  6.0  7.0  8.0  7.0  6.0  1.0  6.0  7.0  8.0  7.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    this works, thank you. But some columns are assigned to wrong groups in my dataset. Is it possible to define exactly which columns fall under which category without writing their names down? For example df.iloc[:,0:8] = group1, df.iloc[:,9:44] = group2 etc.? – KayEss Apr 30 '20 at 09:58
  • @Kuki - Sure, give me a sec. – jezrael Apr 30 '20 at 09:59