0

We count stock over different fruits/vegetables on arbitrary days and save the data into this dataframe:

| product   | stock     | date          |
|---------  |-------    |------------   |
| carrots   | 92        | 2020-05-01    |
| carrots   | 285       | 2020-05-02    |
| apples    | 694       | 2020-05-01    |
| apples    | 586       | 2020-05-02    |
| oranges   | 58        | 2020-05-01    |
| oranges   | 45        | 2020-05-04    |

Using pandas I'd like to know my most recent stock for all types of produce:

| product   | stock     | date          |
|---------  |-------    |------------   |
| carrots   | 285       | 2020-05-02    |
| apples    | 586       | 2020-05-02    |
| oranges   | 45        | 2020-05-04    |

Heck, if there's a faster way to do it without the date, that'd be even finer!

| product   | stock     |
|---------  |-------    |
| carrots   | 285       |
| apples    | 586       |
| oranges   | 45        |

My no good for-loop (pseudo)-code takes minutes to run:

new_df = {}
for prod in df.product.unique():
  maxdate = df[df.product == prod].date.max()
  recentstock = df[(df.product == prod) & (df.date == maxdate)]
  new_df[prod] = recentstock

new_df = pd.DataFrame(new_df)

Is there a nicer pandas-style solution using aggfuncs, groupby or something else to do this?

DannyDannyDanny
  • 838
  • 9
  • 26

1 Answers1

0

You can sort values by product and date and then take the last sample in each group:

df.sort_values(['product', 'date']).groupby('product').tail(1)
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73