The data has duplicated values in column H1 & H2. Based on the combinations, I want to remove the duplicates but aggregate the values in column V1 (sum) & V2 (mean).
df_p = pd.DataFrame({'H1':['A1','A1','C1','B1','A1','C1'],
'H2':['X1','Y1','Z1','X1','Y1','Z1'],
'V1':[11,7,8,4,12,6],
'V2':[110,70,80,40,120,60]})
df_p.drop_duplicates(['H1','H2'])
The output is as follows, which ofcourse doesn't aggregate the values as I want.
H1 H2 V1 V2
0 A1 X1 11 110
1 A1 Y1 7 70
2 C1 Z1 8 80
3 B1 X1 4 40
What I want is
H1 H2 V1 V2
0 A1 X1 11 110
1 A1 Y1 19 95
2 C1 Z1 14 70
3 B1 X1 4 40
I know groupby
/ pivottable
could be options. But I have 50+ columns with 20+ str combinations & 30+ value columns that need to be aggregated. I don't know an effective way to process that without selectively going by each of them.