1

I have a dataframe in the following format that I am trying to reshape into wide format in pandas.

But I am getting the error Index contains duplicate entries, cannot reshape.

df:

id status value
1 item_ordered complete
1 item_received complete
1 item_setup complete
2 item_ordered complete
2 item_setup complete
2 item_setup

The format that I am trying to reshape to:

id item_ordered item_received item_setup
1 complete complete complete
2 complete complete
Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27
Datamaniac
  • 171
  • 2
  • 9

3 Answers3

1

Group the dataframe by id and status columns, then take the first values for value column, finally unstack the resulting series:

>>> df.groupby(['id', 'status']).value.first().unstack().reset_index()

status  id item_ordered item_received item_setup
0        1     complete      complete   complete
1        2     complete           NaN   complete
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
0

Try:

df.dropna().pivot(index='id', columns='status', values='value')
status item_ordered item_received item_setup
id                                          
1          complete      complete   complete
2          complete           NaN   complete

The reason of an error is that for id == 2, the column status contains two indentical values (which will be used for the column index): item_setup.

Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27
  • ValueError: Index contains duplicate entries, cannot reshape - I get the same error using this code – Datamaniac Sep 02 '22 at 14:28
  • @Datamaniac All your `status` values must be unique for each `id`. I haven't seen your full data - so I cannot tell you how you want to transform your data to reach this requirement ... For example, take first, last, average, etc. In my answer I have assumed that removing all nans would be sufficient (based on your example in the question) – Vladimir Fokow Sep 02 '22 at 14:28
0

the second last row should be 'item_received' per the expected result

df.fillna('').pivot(index='id', columns='status')
    value
status  item_ordered    item_received   item_setup
id          
1           complete         complete   complete
2           complete         complete   

if there are multiple statuses per id

df.pivot (index=['id','id2'], columns='status').fillna('').droplevel(1)
    id  value
status      item_ordered    item_received   item_setup
0   1           complete         complete     complete
1   2           complete                      complete
2   2           

data used

data={'id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
 'status': {0: 'item_ordered ',
  1: 'item_received ',
  2: 'item_setup ',
  3: 'item_ordered ',
  4: 'item_received ',
  5: 'item_setup '},
 'value': {0: 'complete',
  1: 'complete',
  2: 'complete',
  3: 'complete',
  4: 'complete',
  5: nan}}```
Naveed
  • 11,495
  • 2
  • 14
  • 21
  • @VladimirFokow, the expected result don't match the provided table. it seems to have a typo – Naveed Sep 02 '22 at 14:21