python pandas reshape data from long to wide

Question

I have a dataframe in the following format that I am trying to reshape into wide format in pandas.

But I am getting the error Index contains duplicate entries, cannot reshape.

df:

id	status	value
1	item_ordered	complete
1	item_received	complete
1	item_setup	complete
2	item_ordered	complete
2	item_setup	complete
2	item_setup

The format that I am trying to reshape to:

id	item_ordered	item_received	item_setup
1	complete	complete	complete
2	complete	complete

the second last row should be 'item_received' per the expected result — Naveed, Sep 02 '22 at 14:19

ThePyGuy · Accepted Answer · 2022-09-02T14:34:04.747

1

Group the dataframe by id and status columns, then take the first values for value column, finally unstack the resulting series:

>>> df.groupby(['id', 'status']).value.first().unstack().reset_index()

status  id item_ordered item_received item_setup
0        1     complete      complete   complete
1        2     complete           NaN   complete

edited Sep 02 '22 at 14:34

answered Sep 02 '22 at 14:27

ThePyGuy

17,779
5
18
45

The id column seems to be missing from the final df – Datamaniac Sep 02 '22 at 14:33
@Datamaniac just add `reset_index()` at last – ThePyGuy Sep 02 '22 at 14:33
Look at the updated answer – ThePyGuy Sep 02 '22 at 14:34

score 0 · Answer 2 · answered Sep 02 '22 at 14:15

0

Try:

df.dropna().pivot(index='id', columns='status', values='value')

status item_ordered item_received item_setup
id                                          
1          complete      complete   complete
2          complete           NaN   complete

The reason of an error is that for id == 2, the column status contains two indentical values (which will be used for the column index): item_setup.

answered Sep 02 '22 at 14:15

Vladimir Fokow

3,728
2
5
27

ValueError: Index contains duplicate entries, cannot reshape - I get the same error using this code – Datamaniac Sep 02 '22 at 14:28
@Datamaniac All your `status` values must be unique for each `id`. I haven't seen your full data - so I cannot tell you how you want to transform your data to reach this requirement ... For example, take first, last, average, etc. In my answer I have assumed that removing all nans would be sufficient (based on your example in the question) – Vladimir Fokow Sep 02 '22 at 14:28

Naveed · Answer 3 · 2022-09-02T14:29:09.120

the second last row should be 'item_received' per the expected result

df.fillna('').pivot(index='id', columns='status')

    value
status  item_ordered    item_received   item_setup
id          
1           complete         complete   complete
2           complete         complete

if there are multiple statuses per id

df.pivot (index=['id','id2'], columns='status').fillna('').droplevel(1)

    id  value
status      item_ordered    item_received   item_setup
0   1           complete         complete     complete
1   2           complete                      complete
2   2

data used

data={'id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
 'status': {0: 'item_ordered ',
  1: 'item_received ',
  2: 'item_setup ',
  3: 'item_ordered ',
  4: 'item_received ',
  5: 'item_setup '},
 'value': {0: 'complete',
  1: 'complete',
  2: 'complete',
  3: 'complete',
  4: 'complete',
  5: nan}}```

@VladimirFokow, the expected result don't match the provided table. it seems to have a typo — Naveed, Sep 02 '22 at 14:21

python pandas reshape data from long to wide

3 Answers3