1

Related to this question, I've been trying to use melt but without success..

I've got a DataFrame with 1 row, like this:

   A   B   C   total   date   A_size   B_size   C_size   total_size
0  4   2   5    11 2019-01-01  123      456      789        1368

Which I'd like to turn into this (at this point I don't care about date anymore):

      Values     Sizes
A        4        123
B        2        456
C        5        789
total    11       1368

I've got something terribly hacky that does the job, but it's not flexible. I'd like to be able to add D and D_size without having to modify the downstream code.

Hacky code:

def format_table(todays_metadata: pd.DataFrame):
    todays_metadata_reformat = todays_metadata.loc[:, 'A':'total'] # hardcoded 'A'
    todays_metadata_reformat.index = ['Values']
    sizes = todays_metadata.loc[:, 'A_size':'total_size'] # hardcoded 'A_size'
    sizes.index = ['Sizes']
    sizes.columns = todays_metadata_reformat.columns
    todays_metadata_reformat = 
    todays_metadata_reformat.append(sizes).transpose()
    return todays_metadata_reformat
petezurich
  • 9,280
  • 9
  • 43
  • 57
Guus
  • 234
  • 2
  • 11

1 Answers1

1

You can check the index for position of the total column by pd.Index.get_loc and create slices:

df1=df.drop('date',1)
i=df1.columns[:df1.columns.get_loc('total')+1] # ['A', 'B', 'C', 'total']
j=df1.columns[df1.columns.get_loc('total')+1:] #['A_size','B_size','C_size','total_size']

Then melt and concat:

m=df1[j].melt(value_name='size')
m.index=m.pop('variable').str.split('_').str[0]
pd.concat([df1[i].melt().set_index('variable'),m],axis=1)

EDIT:

another way using df.filter():

df1=df.drop('date',1)
m=df.filter(like='size')

x=df1[df1.columns.difference(m.columns,sort=False)].melt().set_index('variable')
y=m.rename(columns=lambda x: x.split('_')[0]).melt(value_name='Sizes').set_index('variable')
print(pd.concat([x,y],axis=1))

           value  Sizes
variable              
A             4    123
B             2    456
C             5    789
total        11   1368
anky
  • 74,114
  • 11
  • 41
  • 70
  • Is there a way to make it more robust, i.e. not rely on ordering of the columns? I'd like it to be flexible enough that anything appended with `_size` it can link to the Values of that object, if at all possible – Guus Jul 23 '19 at 12:31
  • @Guus added another way, please check if it scales for you – anky Jul 23 '19 at 12:40
  • 1
    that's great, thanks! I had to replace the `x.split('_')` with `x.replace('_size', '')` as my actual values could contain underscores as well. But that's a beautifully flexible solution – Guus Jul 23 '19 at 13:35