Pandas pivot produces "ValueError: Index contains duplicate entries, cannot reshape"

Question

I have a pandas table formatted as following:

  anger_metric  metric_name angle_value
0   71.0991 roll    14.6832
1   71.0991 yaw     0.7009
2   71.0991 pitch   22.5075
3   90.1341 roll    4.8566
4   90.1341 yaw     6.4458
5   90.1341 pitch   -10.1930

I need to create a view of this that pivots it to sth like this:

  anger_metric  roll yaw pitch 
0   71.0991     14.6832 0.7009 22.5075
1   90.1341     4.8566  6.4458 -10.1930

Here is my code:

df2= results.pivot(index='anger_metric', columns='metric_name', values='angle_value')
# results is the pnadas table/list

I get the following error:

ValueError: Index contains duplicate entries, cannot reshape

How to handle this?

Need aggregate values by `groupby` or `pivot_table` or need remove dupes by `drop_duplicates` - check [this](https://stackoverflow.com/q/11232275/2901002) — jezrael, Oct 02 '17 at 15:28
Maybe you can check [this](https://stackoverflow.com/a/44847778/2901002) for better explanation of problem ans solutions. — jezrael, Oct 02 '17 at 15:31
@jezrael if that case , he should groupby and sum 1st right ? — BENY, Oct 02 '17 at 15:31
`groupby` + some aggregate function like `mean`, `sum`, `first` - it depends of OP what need — jezrael, Oct 02 '17 at 15:33
If you have multiple values for a row-column pair, you cannot pivot. For example if you have multiple entries for row=0 and column=anger_metric what should be in the corresponding cell? That's for you to decide. You can use an aggregation function, like mean, which is the default for pivot_table or drop the duplicates first like jezrael mentioned. — ayhan, Oct 02 '17 at 15:33

cs95 · Accepted Answer · 2019-04-09T22:22:22.100

38

Try pivot_table:

df
   anger_metric metric_name  angle_value
0       71.0991        roll      14.6832
1       71.0991         yaw       0.7009
2       71.0991       pitch      22.5075
3       90.1341        roll       4.8566
4       90.1341         yaw       6.4458
5       90.1341       pitch     -10.1930

result = df.pivot_table(index='anger_metric', 
                        columns='metric_name', 
                        values='angle_value')
result.columns.name = None

result
                pitch     roll     yaw
anger_metric                          
71.0991       22.5075  14.6832  0.7009
90.1341      -10.1930   4.8566  6.4458

edited Apr 09 '19 at 22:22

answered Oct 02 '17 at 15:25

cs95

379,657
97
704
746

how can the resultant view be sliced? to show the first result for example? – A_Matar Oct 02 '17 at 15:39
@A_Matar `result.iloc[0]`? – cs95 Oct 02 '17 at 15:42
when the value is a text column, you can join the string like this.. `df.pivot_table(..., aggfunc=lambda x: ':::'.join(x)` or `first` etc. – ihightower Nov 10 '20 at 09:02

BENY · Answer 2 · 2017-10-02T15:33:19.080

2

By using unstack

df.groupby(['anger_metric','metric_name'])['angle_value'].sum().unstack(-1)# you can using `mean` instead of `sum`
Out[433]: 
metric_name     pitch     roll     yaw
anger_metric                          
71.0991       22.5075  14.6832  0.7009
90.1341      -10.1930   4.8566  6.4458

edited Oct 02 '17 at 15:33

answered Oct 02 '17 at 15:28

BENY

317,841
20
164
234

why is this faster? @jezrael – A_Matar Oct 02 '17 at 15:41
Hard question, I think different implementaion of function. But it depends also of youtr real data, the best test it. – jezrael Oct 02 '17 at 15:42

Pandas pivot produces "ValueError: Index contains duplicate entries, cannot reshape"

2 Answers2

Linked