12

I have a pandas table formatted as following:

  anger_metric  metric_name angle_value
0   71.0991 roll    14.6832
1   71.0991 yaw     0.7009
2   71.0991 pitch   22.5075
3   90.1341 roll    4.8566
4   90.1341 yaw     6.4458
5   90.1341 pitch   -10.1930

I need to create a view of this that pivots it to sth like this:

  anger_metric  roll yaw pitch 
0   71.0991     14.6832 0.7009 22.5075
1   90.1341     4.8566  6.4458 -10.1930

Here is my code:

df2= results.pivot(index='anger_metric', columns='metric_name', values='angle_value')
# results is the pnadas table/list

I get the following error:

ValueError: Index contains duplicate entries, cannot reshape

How to handle this?

cs95
  • 379,657
  • 97
  • 704
  • 746
A_Matar
  • 2,210
  • 3
  • 31
  • 53
  • 3
    Need aggregate values by `groupby` or `pivot_table` or need remove dupes by `drop_duplicates` - check [this](https://stackoverflow.com/q/11232275/2901002) – jezrael Oct 02 '17 at 15:28
  • Maybe you can check [this](https://stackoverflow.com/a/44847778/2901002) for better explanation of problem ans solutions. – jezrael Oct 02 '17 at 15:31
  • @jezrael if that case , he should groupby and sum 1st right ? – BENY Oct 02 '17 at 15:31
  • `groupby` + some aggregate function like `mean`, `sum`, `first` - it depends of OP what need – jezrael Oct 02 '17 at 15:33
  • 1
    If you have multiple values for a row-column pair, you cannot pivot. For example if you have multiple entries for row=0 and column=anger_metric what should be in the corresponding cell? That's for you to decide. You can use an aggregation function, like mean, which is the default for pivot_table or drop the duplicates first like jezrael mentioned. – ayhan Oct 02 '17 at 15:33

2 Answers2

38

Try pivot_table:

df
   anger_metric metric_name  angle_value
0       71.0991        roll      14.6832
1       71.0991         yaw       0.7009
2       71.0991       pitch      22.5075
3       90.1341        roll       4.8566
4       90.1341         yaw       6.4458
5       90.1341       pitch     -10.1930

result = df.pivot_table(index='anger_metric', 
                        columns='metric_name', 
                        values='angle_value')
result.columns.name = None

result
                pitch     roll     yaw
anger_metric                          
71.0991       22.5075  14.6832  0.7009
90.1341      -10.1930   4.8566  6.4458
cs95
  • 379,657
  • 97
  • 704
  • 746
2

By using unstack

df.groupby(['anger_metric','metric_name'])['angle_value'].sum().unstack(-1)# you can using `mean` instead of `sum`
Out[433]: 
metric_name     pitch     roll     yaw
anger_metric                          
71.0991       22.5075  14.6832  0.7009
90.1341      -10.1930   4.8566  6.4458
BENY
  • 317,841
  • 20
  • 164
  • 234