Pandas: Is there a way to use something like 'droplevel' and in process, rename the other level using the dropped level labels as prefix/suffix?

Question

Screenshot of the query below:

Is there a way to easily drop the upper level column index and a have a single level with labels such as points_prev_amax, points_prev_amin, gf_prev_amax, gf_prev_amin and so on?

jezrael · Accepted Answer · 2017-07-20T09:31:29.767

Use list comprehension for set new column names:

df.columns = df.columns.map('_'.join)

Or:

df.columns = ['_'.join(col) for col in df.columns]

Sample:

df = pd.DataFrame({'A':[1,2,2,1],
                   'B':[4,5,6,4],
                   'C':[7,8,9,1],
                   'D':[1,3,5,9]})

print (df)
   A  B  C  D
0  1  4  7  1
1  2  5  8  3
2  2  6  9  5
3  1  4  1  9

df = df.groupby('A').agg([max, min])

df.columns = df.columns.map('_'.join)
print (df)
   B_max  B_min  C_max  C_min  D_max  D_min
A                                          
1      4      4      7      1      9      1
2      6      5      9      8      5      3

print (['_'.join(col) for col in df.columns])
['B_max', 'B_min', 'C_max', 'C_min', 'D_max', 'D_min']

df.columns = ['_'.join(col) for col in df.columns]
print (df)
   B_max  B_min  C_max  C_min  D_max  D_min
A                                          
1      4      4      7      1      9      1
2      6      5      9      8      5      3

If need prefix simple swap items of tuples:

df.columns = ['_'.join((col[1], col[0])) for col in df.columns]
print (df)
   max_B  min_B  max_C  min_C  max_D  min_D
A                                          
1      4      4      7      1      9      1
2      6      5      9      8      5      3

Another solution:

df.columns = ['{}_{}'.format(i[1], i[0]) for i in df.columns]
print (df)
   max_B  min_B  max_C  min_C  max_D  min_D
A                                          
1      4      4      7      1      9      1
2      6      5      9      8      5      3

If len of columns is big (10^6), then rather use to_series and str.join:

df.columns = df.columns.to_series().str.join('_')

score 2 · Answer 2 · answered Sep 09 '16 at 08:38

2

Using @jezrael's setup

df = pd.DataFrame({'A':[1,2,2,1],
                   'B':[4,5,6,4],
                   'C':[7,8,9,1],
                   'D':[1,3,5,9]})

df = df.groupby('A').agg([max, min])

Assign new columns with

from itertools import starmap

def flat(midx, sep=''):
    fstr = sep.join(['{}'] * midx.nlevels)
    return pd.Index(starmap(fstr.format, midx))

df.columns = flat(df.columns, '_')

df

answered Sep 09 '16 at 08:38

piRSquared

285,575
57
475
624

@jezrael This is a new one I came up with today ;-) comprehension is still slightly faster. – piRSquared Sep 09 '16 at 08:39
I think there is one exception - if len of columns is very big (few 10^6), then this is faster. `df.columns = df.columns.to_series().str.join('_')`. But I think practically len of `columns` is small, so list comprehension is better. – jezrael Sep 09 '16 at 08:43
@jezrael it is also faster when there are more levels. `pd.MultiIndex.from_product([list('ABCD'), range(4), list('wxyz')])` – piRSquared Sep 09 '16 at 08:46
Btw, I am surprised then no function for this. – jezrael Sep 09 '16 at 08:48
@jezrael me too. One of us should start contributing to pandas :-) – piRSquared Sep 09 '16 at 08:49
No me, because I am not low code developer as you. so you win it ;) – jezrael Sep 09 '16 at 08:49
I am going check github if some more info about this function. – jezrael Sep 09 '16 at 08:50

Pandas: Is there a way to use something like 'droplevel' and in process, rename the other level using the dropped level labels as prefix/suffix?

2 Answers2

Linked