1

I have the following dataframe:

df = pd.DataFrame({'recipe': ['a', 'a', 'a', 'a', 'a','a', 'b', 'b', 'b', 'b', 'b','b'],
                   'product': ['A', 'B', 'C', 'D', 'E', 'F', 'A', 'B', 'G', 'I', 'K', 'F']})

And I want to pivot the dataframe for the output to have one row for each recipe and multiple columns for each product, ex:

recipe  p1  p2  p3  p4  p5  p6
a       A   B   C   D   E   F
b       A   B   G   I   K   F

Also, there are multiple products, but I want the top 6 based on another column that gives a ranking for how relevant the product is

Maria Nazari
  • 660
  • 1
  • 9
  • 27
  • 1
    What code you tried so far? – Kannan Kandasamy Jan 30 '19 at 20:52
  • @KannanKandasamy I tried using pivot, but it doesn't give me what I am looking for ... I'm playing around using metl and pivot but not luck yet – Maria Nazari Jan 30 '19 at 21:01
  • Maybe you will find something like this of help: ```pd.get_dummies(df, columns=["product"]).groupby("recipe").sum(min_count=1)``` – gorjan Jan 30 '19 at 21:03
  • @gorjan your code is interesting, but I have thousands of products names, and only want the top 10. Your method will not work. I also want the columns to be labeled product 1 - 10 and with the actual product names in the cells. – Maria Nazari Jan 30 '19 at 21:10
  • What do you mean by top 10? – gorjan Jan 30 '19 at 21:15
  • @gorjan so I will have 100s of products matched to each recipe, and I only want the top n of these based on their another column called cos_score. I don't want 100s of product columns, I just want the top n products for each recipe. – Maria Nazari Jan 30 '19 at 21:28

1 Answers1

4

This is actually a groupby and unnesting problem.

u = df.groupby('recipe')['product'].apply(list)
pd.DataFrame(u.tolist(), index=u.index).rename(lambda x: 'p{}'.format(x+1), axis=1)

       p1 p2 p3 p4 p5 p6
recipe                  
a       A  B  C  D  E  F
b       A  B  G  I  K  F
cs95
  • 379,657
  • 97
  • 704
  • 746