dataframe Pivot function duplicate entries error

Question

I am using dataframe pivot function as below, but I am getting below error:

df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Serialno': [1,2,2,1,1,2,2,1]    
                   'position': ['G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'],
                   'points': [5, 7, 7, 9, 4, 9, 9, 12]})

ValueError: Index contains duplicate entries, cannot reshape

Code below:

df.pivot_table(index='team', columns='position', values='points')

When I researched this online I found the root cause is duplicate in data for same index, column combination and below is the fix suggested.

df.pivot_table(index='team', columns='position', values='points', aggfunc='sum')

But in my scenario, I need to apply aggregate function on a new column (which is not part of columns, values). I have a serialno column and in case of duplicate I need to pick the value which has higher serialno.

in above example I want this output :

position    F    G
team        
A           7   7
B           9   9

How can I implement this in above code?

Please make a [mre], meaning some example input and expected output. It might also help to post the [full error message with traceback](//meta.stackoverflow.com/q/359146/4518341). For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). — wjandrea, Sep 01 '23 at 18:38

score 0 · Answer 1 · answered Sep 01 '23 at 18:53

You could use groupby to only keep the relevant rows (with the maximum Serialno) before the pivot:

output = (df[df["Serialno"].eq(df.groupby("team")["Serialno"].transform("max"))]
          .pivot(index="team",columns="position",values="points")
          )

>>> output
position  F  G
team          
A         7  7
B         9  9

dataframe Pivot function duplicate entries error

1 Answers1