How to "unpivot" a spreadsheet with multiple, nested dimensions using pandas?

Question

I've attached two Excel screenshots illustrating my problem. I have data in a spreadsheet organized like this (sort of pivot-y):

...and I want to transform it (unpivot) into a vertical format like this:

Please note that a simple transpose will not solve this. If you look closely at the Excel file (specifically the 'How I Want It' tab), you'll see there's actually a total of 5 dimensions I'm dealing with.

Any guidance on approaches? Really hoping this can be done with Pandas.. not so sure it can be.

Link to Excel file here:

https://drive.google.com/open?id=1hZLQPukHcJwfwmkoPwOPPv5_JViwgL6v

Essentially, yes but I need to transpose from multiple, nested dimensions here. The index is actually defined across multiple row/column attributes. Hence my problem. — idiocrash, Nov 08 '17 at 07:18
@cᴏʟᴅsᴘᴇᴇᴅ - I reopen answer because solution is a bit complicated, not transpose only. — jezrael, Nov 08 '17 at 07:22
@idiocrash - Maybe the main problem is convert excel to DataFrame with MultiIndex, if my solution does not work is possible share your sample excel file? gdocs, dropbox, wetransfer... — jezrael, Nov 08 '17 at 07:24
@jezrael For future reference, if a question is closed incorrectly, please attempt to search for the correct one and edit the list. I'm 90% sure there is a duplicate for this question, but you would rather answer it than have it closed... — cs95, Nov 08 '17 at 07:29
@jezrael You're free to do what you want, but I have a word of advice... if a question is not in a state that writing an answer is guaranteed to solve the OP's problem (as you yourself admit you're not sure), then leave it closed! Or, at least, vote to close as "too broad"/"no mcve". — cs95, Nov 08 '17 at 07:31
@cᴏʟᴅsᴘᴇᴇᴅ - you know how it think - if excel is ideal, then my solution working nice, so answer was added. But if necessary convert unnamed `MultiIndex` values it is not possible without file... — jezrael, Nov 08 '17 at 07:34
Here's a link to the Excel: https://drive.google.com/open?id=1hZLQPukHcJwfwmkoPwOPPv5_JViwgL6v — idiocrash, Nov 08 '17 at 07:35

jezrael · Accepted Answer · 2017-11-08T08:46:56.037

I think you need unstack + reset_index for reshape data:

#create `MultiIndex` DataFrame
df = pd.read_excel('HowItIs.xlsx', header=[0,1], index_col=[0])
#print (df)

#get names for new columns values
a, b = df.columns.names

d = {'level_2':'Category', 'Section Subheading':'Mesurement_Period_Week',
      'SECTION TITLE':'Mesurement_Period'}

fin_order = ['Section_Title','Section_Subheading','Category','Mesurement_Period',
            'Mesurement_Period_Week','Value']
df = (df.unstack()
        .reset_index(name='Value')
        .rename(columns=d)
        .sort_values(['Category','Mesurement_Period','Mesurement_Period_Week'])
        .assign(Section_Title = a, Section_Subheading=b)
        .reset_index(drop=True)
        [fin_order]
        )
print (df.head())
   Section_Title  Section_Subheading Category    Mesurement_Period  \
0  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
1  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
2  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
3  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
4  SECTION TITLE  Section Subheading   Apples  FY15 P03 Dec Weight   

  Mesurement_Period_Week  Value  
0                     W1     53  
1                     W2     53  
2                     W3     53  
3                     W4     53  
4                     W1   2120

Last if need increment each value in Value column per groups:

df['Value'] += df.groupby(['Category','Mesurement_Period']).cumcount()
print (df.head())
   Section_Title  Section_Subheading Category    Mesurement_Period  \
0  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
1  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
2  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
3  SECTION TITLE  Section Subheading   Apples   FY15 P03 Dec Count   
4  SECTION TITLE  Section Subheading   Apples  FY15 P03 Dec Weight   

  Mesurement_Period_Week  Value  
0                     W1     53  
1                     W2     54  
2                     W3     55  
3                     W4     56  
4                     W1   2120

I don't think this quite addresses the depth of the nesting in the Excel. There's actually 5 dimensions. Tricky. — idiocrash, Nov 08 '17 at 07:56
I posted the same Excel doc as the screenshots show: https://drive.google.com/open?id=1hZLQPukHcJwfwmkoPwOPPv5_JViwgL6v — idiocrash, Nov 08 '17 at 08:02
Yes, real data follows the same structure as the example data. — idiocrash, Nov 08 '17 at 08:09
...I just realized that my 'How I Want It' tab has incremented values (left over from dragging as a "series" in Excel), but that was due to me not paying attention late at night so the increment step was not needed. In any event, you solved my main problem and one extra one that I mistakenly introduced! Thank you so much! — idiocrash, Nov 08 '17 at 17:44

How to "unpivot" a spreadsheet with multiple, nested dimensions using pandas?

1 Answers1