My question seems duplicate as I found different questions with the same error as follows:
Pandas: grouping a column on a value and creating new column headings
Python/Pandas - ValueError: Index contains duplicate entries, cannot reshape
Pandas pivot produces "ValueError: Index contains duplicate entries, cannot reshape
I tried all the solutions presented on those posts, but none worked. I believe the error maybe be caused by my dataset format, which has Strings instead of numbers and possible duplicate entires. Here follows an example of my Dataset:
protocol_no | activity | description |
---|---|---|
1586212 | walk | twice a day |
1586212 | drive | 5 km |
1586212 | drive | At least 30 min |
1586212 | sleep | NaN |
1586212 | eat | 1500 calories |
2547852 | walk | NaN |
2547852 | drive | NaN |
2547852 | eat | 3200 calories |
2547852 | eat | Avoid pasta |
2547852 | sleep | At least 10 hours |
The output I'm trying to achieve is:
protocol_no | walk | drive | sleep | eat |
---|---|---|---|---|
1586212 | twice a day | 5km | NaN | 1500 calories |
2547852 | NaN | NaN | 3200 calories | At least 10 hours |
I tried using pivot and pivot_table with a code like this:
df.pivot(index="protocol_no", columns="activity", values="description")
But I'm still getting this error:
ValueError: Index contains duplicate entries, cannot reshape
Have no idea what is going wrong, so any help will be helpful!
EDIT:
I noticed my data contains duplicate entires as stated by the error and by @DYZ and @SeaBean users. So I've edited the database example and provided the correct answer for my dataset as well. Hope it helps someone.