0

I have pandas dataframe which has two columns - ID and an item associated with ID:

id item
1  apple
2  banana
2  apple
3  avocado

(there are duplicate ID's)

And want to transform it into a table like this:

id  apple banana avocado
------------------------
1     1     0      0
2     1     1      0
3     0     0      1

I thought pandas pivot or pivot table did just this, but I am not getting expected outcomes when trying those functions. So far I have tried

df.pivot(index="id", columns="items") # gives ValueError: Index contains duplicate entries, cannot reshape
df.pivot_table(index="id", columns="items", aggfunc='count') # gives an empty dataframe

What can I do to accomplish the desired result?

Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45
Ach113
  • 1,775
  • 3
  • 18
  • 40
  • 2
    Use `df = pd.get_dummies(df, columns=['item'], prefix='', prefix_sep='').groupby('id', as_index=False).max()` – jezrael Dec 10 '19 at 08:52
  • 1
    @jezrael, I see, thats kinda what I was looking for, since I want ID to be unique in my table, I just did `df.groupby(by='id').sum()` on top of `get_dummies()` – Ach113 Dec 10 '19 at 08:56

0 Answers0