0

I am using dataframe pivot function as below, but I am getting below error:

df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Serialno': [1,2,2,1,1,2,2,1]    
                   'position': ['G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'],
                   'points': [5, 7, 7, 9, 4, 9, 9, 12]})
ValueError: Index contains duplicate entries, cannot reshape

Code below:

df.pivot_table(index='team', columns='position', values='points')

When I researched this online I found the root cause is duplicate in data for same index, column combination and below is the fix suggested.

df.pivot_table(index='team', columns='position', values='points', aggfunc='sum')

But in my scenario, I need to apply aggregate function on a new column (which is not part of columns, values). I have a serialno column and in case of duplicate I need to pick the value which has higher serialno.

in above example I want this output :

position    F    G
team        
A           7   7
B           9   9

How can I implement this in above code?

PythonDeveloper
  • 289
  • 1
  • 4
  • 24
  • 1
    post the input sample and expected result – RomanPerekhrest Sep 01 '23 at 18:29
  • Please make a [mre], meaning some example input and expected output. It might also help to post the [full error message with traceback](//meta.stackoverflow.com/q/359146/4518341). For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Sep 01 '23 at 18:38
  • @RomanPerekhrest added example and sample data. – PythonDeveloper Sep 01 '23 at 18:51

1 Answers1

0

You could use groupby to only keep the relevant rows (with the maximum Serialno) before the pivot:

output = (df[df["Serialno"].eq(df.groupby("team")["Serialno"].transform("max"))]
          .pivot(index="team",columns="position",values="points")
          )

>>> output
position  F  G
team          
A         7  7
B         9  9
not_speshal
  • 22,093
  • 2
  • 15
  • 30