-1

Thanks for taking the time to read this :)

I have a data frame that contains 3 columns:

account opportunity product
c1         o1           p1
c1         o1           p2
c1         o1           p3
c2         o2           p2
c2         o2           p3
c2         o4           p1

I want to pivot it to something like this:

account opportunity product
c1          o1         [p1,p2,p3]
c2          o2         [p3,p3]
c2          o4         [p1]

so that I am able to one-hot encode the product field like so

df= df.join(pd.DataFrame(mlb.fit_transform(issues.pop('product')), 
                              columns=mlb.classes_,
                              index=df.index))

The final output will look like this:

 account opportunity p1   p2   p3 
    c1          o1    1    1   1 
    c2          o2    0    1   1
    c2          o4    1    0   0

I have not been able to find out the appropriate way to do the first transformation... Could anyone please help me in this regard? Is it possible through df.pivot?

Devarshi Goswami
  • 1,035
  • 4
  • 11
  • 26
  • 1
    See https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby and https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns – politinsa Apr 22 '21 at 07:42
  • 1
    ``df.groupby(['account', 'opportunity', 'product']).size().unstack('product', fill_value = 0).reset_index()`` – sammywemmy Apr 22 '21 at 07:57

1 Answers1

1

This is crosstab with multiple columns:

pd.crosstab(
    index=[df["account"], df["opportunity"]], 
    columns=df["product"]
).reset_index()
product account opportunity  p1  p2  p3
0            c1          o1   1   1   1
1            c2          o2   0   1   1
2            c2          o4   1   0   0
Erfan
  • 40,971
  • 8
  • 66
  • 78