I am having an issue with a dateframe I have created. It has multiple columns along with the 2 columsn im trying to group by and its a date time.
the table is as follows-
product number color solddate price
TV 123 green 20/04/2020 50
TV 123 green 19/04/2020 100
Im trying to return just the row with the highest price. Regardless of solddate. but I still need to return the solddate.
product number color solddate price
TV 123 green 19/04/2020 100
This is on a dataframe which contains approximately 70k rows.
I was trying with :
price = new_df['price']
c_maxes = new_df.groupby(['product', 'number','color' ]).price.transform(max)
new__df2 = c_maxes.loc[c_maxes == new_df.price]
print(new__df2)
but its not working, if I output to excel im still able to use that dedup function and remove around 600 rows.
Thanks