0

I have a data frame;

Date     Price  Product
1/1/12   22       Pen
1/2/12   44      Paper
1/2/12   33      Paper
1/3/12   34      Paper

And I want to just have the min value if there are duplicates for Date and Product.

So the expected output is

Date     Price  Product
1/1/12   22       Pen
1/2/12   33      Paper
1/3/12   34      Paper

I am happy to keep the data in the flat file format or create a time series pivot table.

The only option I can currently see is to sort by price (highest to lowest) and then remove duplicates and keep 'last'. but was keen to explore if there is a better way to do this

fred.schwartz
  • 2,023
  • 4
  • 26
  • 53

2 Answers2

2
df.sort_values('Price', ascending=False).groupby(['Date','Product'],sort=False).last()


                Price
Date    Product 
1/2/12  Paper   33
1/3/12  Paper   34
1/1/12  Pen     22

Feedback from cs95 was accurate.

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • 1
    The last value is not necessarily the minimum. Sort by descending Price first, then group with sort=False, to make this work. – cs95 Feb 24 '20 at 11:59
  • yep; u r spot on @cs95. thanks – sammywemmy Feb 24 '20 at 12:03
  • Your answer still needs a bit more work. Since you're using last(), you want to sort in descending order. Otherwise use first(). – cs95 Feb 24 '20 at 12:04
1

You can first sort by Price and then drop dupes by Date and Product.

df.sort_values(by=['Price']).drop_duplicates(subset=['Date', 'Product'])


    Date    Price   Product
0   1/1/12  22      Pen
2   1/2/12  33      Paper
3   1/3/12  34      Paper
Allen Qin
  • 19,507
  • 8
  • 51
  • 67