0

I have a dataframe products. Products looks this:

Cust_ID  Prod   Time_of_Sale
A        Bat      1 
A        Ball     2
A        Lego     3
B        Lego     3
B        Lego     9
B        Ball     11
B        Bat      11
B        Bat      13
C        Bat      2
C        Lego     2

I want to change it so that it becomes like this:

Cust_ID  Bat   Bat  Ball Lego Lego
A        1     NaN  2    3    NaN
B        11    13   11   3    9  
C        2     NaN  NaN  2    NaN

I have been playing around with products.groupby() and it is not really leading me anywhere. Any help is appreciated.

The aim is to 'visualize' the order in which each item was purchased by each customer. I have more than 1000 unique Customers.

Edit: I see that a user suggested that I go through How to pivot a dataframe. But this doesn't work because my columns have duplicate values.

Baby_Coder
  • 17
  • 4
  • 1
    Not exactly. I went through the page to see if it does. A problem with pivot columns is that they are usually unique. If you can see my 2nd table, you see that the columns are not unique. I mean pandas wouldn't allow it, I'll have to rename the duplicates to `bat1`, `bat2` but even then, they are picking up values from `bat` of the original table – Baby_Coder Jun 18 '20 at 17:58

1 Answers1

1

This is a little tricky with duplicates on Prod. Basically you need a cumcount and pivot:

new_df = (df.set_index(['Cust_ID','Prod',
                        df.groupby(['Cust_ID', 'Prod']).cumcount()])
     ['Time_of_Sale']
   .unstack(level=(1,2))
   .sort_index(axis=1)
)   
new_df.columns = [x for x,y in new_df.columns]
new_df = new_df.reset_index()

Output:

  Cust_ID  Ball   Bat   Bat  Lego  Lego
0       A   2.0   1.0   NaN   3.0   NaN
1       B  11.0  11.0  13.0   3.0   9.0
2       C   NaN   2.0   NaN   2.0   NaN

Note: Duplicated column names, although supported, should be avoid in Pandas.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74