Pandas Pivot with Strings- ValueError: Index contains duplicate entries, cannot reshape

Question

Given this data frame:

import pandas as pd
df=pd.DataFrame({'Field':['a','b','a','b'],'Value':['aa','bb','cc','dd'],
'indexer':[0,0,1,1]})

df
  Field Value  indexer
0     a    aa        0
1     b    bb        0
2     a    cc        1
3     b    dd        1

I want to produce a dataframe like this:

indexer   a   b
0         aa  bb
1         cc  dd

I've seen answers on how to achieve this when the value field is numeric, but I cannot seem to get this working with string data.

I've tried df.groupby('indexer') but cannot seem to display it or get it into a dataframe. I've found answers for these, but they assume float or integer values.

Thanks in advance!

jezrael · Accepted Answer · 2017-12-17T19:51:55.547

3

There is problem your real data contains duplicates in pairs indexer with Field, so is necessary some aggregate function like ', '.join, because working with strings:

df = df.groupby(['indexer', 'Field'])['Value'].apply(', '.join).unstack()
print (df)
Field     a   b
indexer        
0        aa  bb
1        cc  dd

Or:

df = df.pivot_table(index='indexer', columns='Field', values='Value', aggfunc=','.join)

edited Dec 17 '17 at 19:51

answered Dec 17 '17 at 19:48

jezrael

822,522
95
1,334
1,252

1

Thanks :-) , I did not notice that require join, when I saw your answer, thank you my friend :-) – BENY Dec 17 '17 at 19:55

score 0 · Answer 2 · answered Jun 28 '20 at 16:56

0

You should use crosstab.

Example:

pd.crosstab(index=df.indexer, columns=df.field, aggfunc=lambda v:v)

answered Jun 28 '20 at 16:56

Amnon

2,212
1
19
35

Pandas Pivot with Strings- ValueError: Index contains duplicate entries, cannot reshape

2 Answers2