Delete every row except latest date of the each year and then convert date to only a year

Question

I am trying to leave only rows in dataframe which are the latest from each year included (from 2000 till 2018) and after that convert date from dd-mm-yyyy to only a year number.

So far I got only imported the data:

df_spx = web.DataReader('^GSPC', 'yahoo', start='2000', end='2018')
df_spx.reset_index(inplace=True)
df_spx['Date'] = pd.to_datetime(df_spx['Date'])
df_spx

And the output is this (as a image in url, can't post pictures yet):

enter image description here

Please make a [good example](https://stackoverflow.com/a/20159305/3620003). I'm not typing that image into my interpreter. Include the desired output. Five rows max. As executable code or copy-pasteable. — timgeb, Jan 24 '22 at 15:15
@timgeb thank you for feedback. I already got answer below that worked and will improve my questions in future — brankoo, Jan 24 '22 at 15:43

score 0 · Accepted Answer · answered Jan 24 '22 at 15:19

You might want to order the dataframe by 'Date' columne.

Then, keep only Year value to group.

Finally, you can get latest rows after grouping the dataframe, as follows:

import pandas as pd

# sample dataframe
df = pd.DataFrame({
    'Date': ['2000-01-03', '2000-01-04', '2018-12-19', '2018-12-18'],
    'High': [1, 2, 3, 4],
})

df['Date'] = pd.to_datetime(df['Date'])
df.sort_values(by=['Date'], inplace=True)  # order by Date
df['Date'] = df['Date'].dt.year  # already ordered by Date, then leave year only.

df = df.groupby('Date').tail(1)  # get latest row per year

print(df)
#   Date  High
#1  2000     2 # latest row per 2000 (2000-01-04)
#2  2018     3 # latest row per 2018 (2018-12-19)

Delete every row except latest date of the each year and then convert date to only a year

1 Answers1