1

I have a Pandas DataFrame where respondents answer the same question over multiple sales accounts. My input Dataframe is of the following format

df = pd.DataFrame({"Sales_Acc1":[100,300],
              "Sales_Acc2":[200,500],
              "Time_Acc1":[2,5],
              "Time_acc2":[6,9],
              "Acc_Number_acc1":[1001,1005],
              "Acc_Number_acc2":[1009,1010]},
             index=["John","Dave"])
df
>>> Sales_Acc1  Sales_Acc2  Time_Acc1   Time_acc2   Acc_Number_acc1 Acc_Number_acc2
John    100     200          2          6           1001            1009
Dave    300     500          5          9           1005            1010

I want to pivot this so that each account would have its own row. My desired end Dataframe would look like:

df
>>> AccountNumber   Rep   Sales   Time 
     1001           John  100     2
     1005           John  300     6
     1009           Dave  200     5
     1010           Dave  500     9

I have tried using melt as well as pivot but I cannot figure it out. I appreciate any assistance.

2 Answers2

3

This is a wide_to_long problem given your column are in the format 'stub_SomeSuffix'. Because you have some inconsistent cases we'll make everything lower. We also need to remove the names from the index since wide_to_long requires columns.

df.columns = df.columns.str.lower()

df = (pd.wide_to_long(df.rename_axis('Rep').reset_index(),    # Use Rep as index
                      i='Rep',                                # index of output
                      j='will_drop',                          # Suffix labels  
                      stubnames=['sales', 'acc_number', 'time'],
                      sep='_', 
                      suffix='.*') 
        .reset_index()
        .drop(columns='will_drop'))

#    Rep  sales  acc_number  time
#0  John    100        1001     2
#1  Dave    300        1005     5
#2  John    200        1009     6
#3  Dave    500        1010     9

If you aren't a fan of the endless arguments required with wide_to_long we can instead create a simple MultiIndex on the columns and then this is a stack. Again because of inconsistent casing we make all the columns fully lower case.

df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.lower().str.rsplit('_', n=1)))
#     sales      time      acc_number      
#      acc1 acc2 acc1 acc2       acc1  acc2
#John   100  200    2    6       1001  1009
#Dave   300  500    5    9       1005  1010

df.stack(-1).reset_index(-1, drop=True)
#      acc_number  sales  time
#John        1001    100     2
#John        1009    200     6
#Dave        1005    300     5
#Dave        1010    500     9
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Can you please provide a resource from where I can learn longtowide properly? This is the first time I am reading this. – user27286 Apr 07 '20 at 19:12
  • In front of your answer, my answer seems so stupid. I guess I should delete it. I will upvote yours. Lots of things I am getting to know. – user27286 Apr 07 '20 at 19:15
  • @user27286 Actually no, yours isn't bad just a bit Manual. I'll add another option to mine to show you how you might do this with a bit less manual writing. – ALollz Apr 07 '20 at 20:03
  • Thanks for the response. What does the * modifier do infront of df? – Victor Nogueira Apr 07 '20 at 20:20
  • @VictorNogueira is essentially unpacks the elements. There's a better explanation in https://stackoverflow.com/questions/2921847/what-does-the-star-operator-mean-in-a-function-call than I can provide here in the comments. On second thought, it may be a bit clear to construct the MultiIndex from tuples. You could do `pd.MultiIndex.from_tuples(df.columns.str.lower().str.rsplit('_', n=1).tolist())` instead. – ALollz Apr 07 '20 at 20:28
  • @ALollz.: Thanks for the support. I read your answer now. I guess somebody disliked my answer. By the way my answer is working..but I guess it can be better that's why the DV. It's a good learning experience. So thanks again. – user27286 Apr 07 '20 at 21:03
  • @ALollz I had one more question about the first argument to `pd.wide_to_long`, I believe on the original post you had `pd.wide_to_longdf.rename_axis("rep").reset_index().reset_index()` which worked, but I was wondering why there was two. Did you edit it or did I just miss something?Thank you for your answer. – Victor Nogueira Apr 07 '20 at 23:25
  • @VictorNogueira Hmm if there were 2 reset_indices then it was a typo on my part. It won't break anything but it's unnecessary. We just need a single `reset_index` so that the names become a column as the `i` argument can only accept column labels, not anything in the index. (The rename_axis is just a nice way to give it a name instead of the default 'index'` it would get without that) – ALollz Apr 07 '20 at 23:38
2
df1=df[['Sales_Acc1','Time_Acc1','Acc_Number_acc1']]
df2=df[['Sales_Acc2','Time_acc2','Acc_Number_acc2']]
df1.columns=['Sales_Acc','Time_Acc','Acc_Number']
df2.columns=['Sales_Acc','Time_Acc','Acc_Number']
df3 = df1.append(df2)
df3.index.names = ['Rep']

My solution is as simple as this and it would work in this case we are basically putting columns in different rows.

First selecting the relevant columns and then renaming the column names and appending it to the other to get the final result.

This is the full code:

import pandas as pd
df = pd.DataFrame({"Sales_Acc1":[100,300],
              "Sales_Acc2":[200,500],
              "Time_Acc1":[2,5],
              "Time_acc2":[6,9],
              "Acc_Number_acc1":[1001,1005],
              "Acc_Number_acc2":[1009,1010]},
             index=["John","Dave"])
df1=df[['Sales_Acc1','Time_Acc1','Acc_Number_acc1']]
df2=df[['Sales_Acc2','Time_acc2','Acc_Number_acc2']]
df1.columns=['Sales_Acc','Time_Acc','Acc_Number']
df2.columns=['Sales_Acc','Time_Acc','Acc_Number']
df3 = df1.append(df2)
df3.index.names = ['Rep']
df3.head()

Output:

enter image description here

user27286
  • 486
  • 4
  • 11
  • You had a small typo in your answer where you never defined `df3` before you referenced it, leading to a `NameError`. (I made the small adjustment). My guess is that's what caused the original downvote – ALollz Apr 07 '20 at 21:10
  • @ALollz.: I have run this code. `df3` is a new variable. This code runs. And gives the following output. I have run it again. Are you sure that it is the problem? – user27286 Apr 07 '20 at 22:15
  • @ALollz.: I see...you have edited it. So thank you very much. I didn't even understand...Thanks a lot for your help and for letting me know where I went wrong. Thanks again. – user27286 Apr 07 '20 at 22:45