2

This is my below data frame "df" sorted by date

item  price    date
  A    11.2   2021-12-31
  B    32.4   2021-12-31
  C    75.5   2021-12-31 
  A    89.3   2021-12-30
  B    12.2   2021-12-30 
  C    14.2   2021-12-30

basically I just need the Data at the last available date

i.e

 item  price    date
  A    11.2   2021-12-31
  B    32.4   2021-12-31
  C    75.5   2021-12-31 

I tried the same with the below code , its not work as expected. I am new to pandas kindly help.

df = df.set_index(['date'])
df = df.loc[df['date'][0]]
rpanai
  • 12,515
  • 2
  • 42
  • 64
massu1000
  • 209
  • 2
  • 10

3 Answers3

2

You can extract the indices of the maximum dates for each item, and then select them:

idxs = df.groupby(['item'])['date'].transform(max) == df['date']
print(df[idxs])
sagi
  • 40,026
  • 6
  • 59
  • 84
1

Don't forget that you can always use dict = df.to_dict() or list = df.toList() and use a dict or a list that you are probably more experienced with.

White_Sirilo
  • 264
  • 1
  • 11
1

You can sort your dataframe based on your item and date columns (in ascending order which is the default) and return the last row using tail:

df.sort_values(['item','date']).groupby(['item']).tail(1)

  item  price       date
0    A   11.2 2021-12-31
1    B   32.4 2021-12-31
2    C   75.5 2021-12-31

The pd.to_datetime part will ensure that your date is of datetime type. So essentially you need:

df.assign(date = pd.to_datetime(df['date'])).sort_values(['item','date']).groupby(['item']).tail(1)
sophocles
  • 13,593
  • 3
  • 14
  • 33