0

The objective is to unpivot the following table

  Activity General  m1   t1  m2   t2  m3   t3
0       P1      AA  A1  TA1  A2  TA2  A3  TA3
1       P2      BB  B1  TB1  B2  TB2  B3  TB3

into the following format

    Activity   General M Task
0   P1 AA A1  TA1 
1   P1 AA A2  TA2 
2   P1 AA A3  TA3 
3   P2 BB B1  TB1 
4   P2 BB B2  TB2
5   P2 BB B3  TB3

Based on some reading, the module melt can be used to achieved the desired objective.

import pandas as pd
from pandas import DataFrame
list_me = [['P1','AA','A1','TA1','A2','TA2','A3','TA3'],
           ['P2', 'BB', 'B1', 'TB1', 'B2', 'TB2', 'B3', 'TB3']]

df = DataFrame (list_me)
df.columns = ['Activity','General','m1','t1','m2','t2','m3','t3']   
melted_form=pd.melt(df, id_vars=['Activity','General'],var_name='m1',value_name='new_col')

However, most of the example found on the net was solely to tackle single column. I am thinking of using for loop to loop the m1 m2 and m3 and merge the result concurrently. This is because, in actually, the pair of m_i and t_i is at the range of hundreds (where i is the index)

But, I wonder there are more efficient approach than looping.

p.s. I had tried suggestion as in the OP, but, it does not give the intended output

mpx
  • 3,081
  • 2
  • 26
  • 56

1 Answers1

1

If I understand your question, you could use pd.wide_to_long :

    (pd.wide_to_long(df, 
                    i=["Activity", "General"], 
                    stubnames=["t", "m"], j="number")
    .set_axis(["Task", "M"], axis="columns")
    .droplevel(-1).reset_index()
     )

    Activity    General Task    M
0      P1       AA      TA1     A1
1      P1       AA      TA2     A2
2      P1       AA      TA3     A3
3      P2       BB      TB1     B1
4      P2       BB      TB2     B2
5      P2       BB      TB3     B3
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • Hi @sammy, yes understand the question correctly. It would be appreciated if you can explain more about the selection of `t` and `m` in your answer. – mpx Oct 12 '20 at 10:49
  • 1
    I included the link to the function in my answer. It basically looks for columns that start with `t` and `m` , and separates them into individual columns; they have to be prefixes. – sammywemmy Oct 12 '20 at 10:52
  • Hi Sam, sorry for not include this info much more earlier. But, how handle the `stubnames` if the columns is in these format `'Jan 2020','Feb 2020','Mar 2020','Apr 2020','May 2020','Jun 2020'`. This ofcourse range from Jan to December from the year 2020 to say 2050. Appreciate for any hints. At this moment, I am thinking of changing the column name so that it start with say, the letter `t` or `m`.For example `m Jan 2020` But is this advisable? – mpx Oct 12 '20 at 11:02
  • 1
    no problem. the same concept applies; however, you could create a new question and let's see if a simpler solution is possible. – sammywemmy Oct 12 '20 at 11:03