3

I have a pandas dataframe in a transactional format:

id  purchased_item
1   apple
1   banana
1   carrot
2   banana
3   apple
4   apple
4   carrot
4   diet coke
5   banana
5   carrot
6   banana
6   carrot

I would like to convert this to the following:

[['apple', 'banana', 'carrot'],
 ['banana'],
 ['apple'],
 ['apple', 'carrot', 'diet coke'],
 ['banana', 'carrot'],
 ['banana', 'carrot']]

I have tried this:

df.groupby(['id'])['purchased_item'].apply(list)

The output looks like:

customer_id
1                 [apple, banana, carrot]
2                                [banana]
3                                 [apple]
4              [apple, carrot, diet coke]
5                        [banana, carrot]
6                        [banana, carrot]

What to do next? Or is there a different approach? Thanks a lot for help.

kevin
  • 1,914
  • 4
  • 25
  • 30
  • This question is answered here:http://stackoverflow.com/questions/34080979/convert-pandas-dataframe-to-a-list#34081130 – Joe T. Boka Dec 04 '15 at 06:20
  • To me, this is a different question because the original data are in a different format. So I am looking for a different solution. – kevin Dec 04 '15 at 06:23
  • I found a solution from here http://stackoverflow.com/questions/15112234/converting-dataframe-into-a-list – kevin Dec 04 '15 at 06:36

2 Answers2

3

The solution which you mentioned in a comment from answer to question:

df.groupby(['id'])['purchased_item'].apply(list).values.tolist()

In [434]: df.groupby(['id'])['purchased_item'].apply(list).values.tolist()
Out[434]:
[['apple', 'banana', 'carrot'],
 ['banana'],
 ['apple'],
 ['apple', 'carrot', 'diet_coke'],
 ['banana', 'carrot'],
 ['banana', 'carrot']]

EDIT

Some test performance to compare with @Colonel Beauvel solution:

In [472]: %timeit [gr['purchased_item'].tolist() for n, gr in df.groupby('id')]
100 loops, best of 3: 2.1 ms per loop

In [473]: %timeit df.groupby(['id'])['purchased_item'].apply(list).values.tolist()
1000 loops, best of 3: 1.36 ms per loop
Community
  • 1
  • 1
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
1

I would rather employ a different solution using comprehension list:

[gr['purchased_item'].tolist() for n, gr in df.groupby('id')]

Out[9]:
[['apple', 'banana', 'carrot'],
 ['banana'],
 ['apple'],
 ['apple', 'carrot', 'dietcoke'],
 ['banana', 'carrot'],
 ['banana', 'carrot']]
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87