0

Sorry if this is a repeated question, but I have struggled to find an existing thread with a solution that works for my problem.

I am working with a dataset that looks something like this

df = pd.DataFrame(data={"product":["crisps", "crisps", "crisps",
                                   "bread", "bread", "bread",
                                   "pasta","pasta"], 
                   "ingredients": ["potato", "oil", "salt", 
                                   "flour", "salt", "water", 
                                   "flour", "eggs"]})

Raw dataset format:
Raw dataset format

But I need the dataset in this format

Rolled up dataset:
Rolled up dataset

I know this can be done using an SQL query, but is there a way of doing this in python/pandas? The actual dataset I am working with contains over 100,000 different food products so the solution will need to be scalable.

Absolutely any advice would be much appreciated!

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
mgg7199
  • 3
  • 1

1 Answers1

0

Try:

df.groupby('product', as_index=False)[['ingredients']].agg(list)

Output:

  product           ingredients
0   bread  [flour, salt, water]
1  crisps   [potato, oil, salt]
2   pasta         [flour, eggs]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187