I'm new to python and pandas.
I'm trying to code something in which I compared the second last row of a pandas dataframe indexed by datetime with the last row of the dataframe. However, I am having trouble trying to get the index to compare properly.
For example, for a dataframe
dti = pd.date_range('2018-01-01', periods=10, freq='D')
df = pd.DataFrame(np.random.randn(10, 4), index=dti, columns=['value', 'value1', 'value2', 'value3'])
Produces a dataframe
value value1 value2 value3
2018-01-01 0.249018 -0.630937 1.120733 0.343331
2018-01-02 -0.347979 0.218776 -0.327967 0.569075
2018-01-03 0.879929 0.316554 0.159652 -0.872002
2018-01-04 -1.492683 -0.560168 -0.871728 1.739402
2018-01-05 -3.187776 -1.399635 0.873503 -0.098537
2018-01-06 -0.482267 -0.174304 0.821850 1.530861
2018-01-07 0.385600 -0.395019 -0.507524 -0.033996
2018-01-08 1.358576 -2.107309 0.956507 -0.360288
2018-01-09 -0.283369 0.630684 1.148554 -1.907604
2018-01-10 0.896697 0.030671 0.131533 -1.325552
I'm trying to compare
if df.loc[last_row, ['value']] > df.loc[second_last_row, ['value']]:
# Compare to see if 'value' in 2018-01-10 is higher than 'value' in 2018-01-09
I have tried using
length = len(df)
if if df.loc[length-1, ['value']] > df.loc[length-2, ['value']]:
but I get an error
TypeError: cannot do index indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [5] of <class 'int'>
It seems like the DatetimeIndex cannot me manipulated the same way as regular integer indices.
I have also considered using
df['value'].tail(1)
but I'm not sure how I would be able to access the second last row using the .tail method.
I need advice on how I would be able to retrieve data from the tail end of my dataframe based on its distance from the end of the dataframe. Thanks!