I have a df of fruit purchases sorted by date. I want to drop duplicates by fruit. But the way to drop duplicates depend on the column. The solution needs to generalise to more columns. But the 3 types of operations remain the same:
For each fruit:
- price column should be the highest sold price
- date, place and colour columns should be the most recent value that isn't NaN
- qty should be the average number sold
df fruit date price place colour qty
0 Apple 25-12-2023 4 NaN Green 5
1 Apple 22-11-2023 5 London Red 6
2 Apple 20-10-2023 6 Paris NaN 8
3 Pear 19-10-2023 4 Sweden Red 8
4 Pear 18-10-2023 5 London Green 8
5 Pear 17-10-2023 10 Paris Purple 9
Expected Output:
fruit date price place colour qty
Apple 25-12-2023 6 London Green 6.33 (5+6=8/3)
Pear 19-10-2023 10 Sweden Red 8.33 (8+8+9/3)