I am trying to leave only rows in dataframe which are the latest from each year included (from 2000 till 2018) and after that convert date from dd-mm-yyyy to only a year number.
So far I got only imported the data:
df_spx = web.DataReader('^GSPC', 'yahoo', start='2000', end='2018')
df_spx.reset_index(inplace=True)
df_spx['Date'] = pd.to_datetime(df_spx['Date'])
df_spx
And the output is this (as a image in url, can't post pictures yet):