0

I have this dataframe

                     open      high       low     close      volume
TimeStamp                                                              
2017-12-22 13:15:00  12935.00  13200.00  12508.71  12514.91  244.728611
2017-12-22 13:30:00  12514.91  12999.99  12508.71  12666.34  150.457869
2017-12-22 13:45:00  12666.33  12899.97  12094.00  12094.00  198.680014
2017-12-22 14:00:00  12094.01  12354.99  11150.00  11150.00  256.812634
2017-12-22 14:15:00  11150.01  12510.00  10400.00  12276.33  262.217127

I want to know if every rows have exactly 15 minutes diference in time So I build a new column with a shift of the first columns

                         open      high       low     close      volume  \
TimeStamp                                                                 
2017-12-20 13:30:00  17503.98  17600.00  17100.57  17119.89  312.773644   
2017-12-20 13:45:00  17119.89  17372.98  17049.00  17170.00  322.953671   
2017-12-20 14:00:00  17170.00  17573.00  17170.00  17395.74  236.085829   
2017-12-20 14:15:00  17395.74  17398.00  17200.01  17280.00  220.467382   
2017-12-20 14:30:00  17280.00  17313.94  17150.00  17256.05  222.760598   

                                new_time  
TimeStamp                                 
2017-12-20 13:30:00  2017-12-20 13:45:00  
2017-12-20 13:45:00  2017-12-20 14:00:00  
2017-12-20 14:00:00  2017-12-20 14:15:00  
2017-12-20 14:15:00  2017-12-20 14:30:00  
2017-12-20 14:30:00  2017-12-20 14:45:00  

Now I want to locate every row that don't respect the 15minutes diference rule so I did

dfh.loc[(dfh['new_time'].to_pydatetime()-dfh.index.to_pydatetime())>datetime.timedelta(0, 900)]

I get this error,

    Traceback (most recent call last):
  File "<pyshell#252>", line 1, in <module>
    dfh.loc[(dfh['new_time'].to_pydatetime()-dfh.index.to_pydatetime())>datetime.timedelta(0, 900)]
  File "C:\Users\Araujo\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'to_pydatetime'

Is there any way of do this?

EDIT:

Shift just works with periodic, there is any way of do this with non periodic?

Constantin Groß
  • 10,719
  • 4
  • 24
  • 50
hopieman
  • 399
  • 7
  • 22

2 Answers2

1

This would work:

import pandas as pd
import numpy as np
import datetime as dt

data = [            
['2017-12-22 13:15:00',  12935.00,  13200.00,  12508.71,  12514.91,  244.728611],
['2017-12-22 13:30:00',  12514.91,  12999.99,  12508.71,  12666.34,  150.457869],
['2017-12-22 13:45:00',  12666.33,  12899.97,  12094.00,  12094.00,  198.680014],
['2017-12-22 14:00:00',  12094.01,  12354.99,  11150.00,  11150.00,  256.812634],
['2017-12-22 14:15:00',  11150.01,  12510.00,  10400.00,  12276.33,  262.217127]
]

df = pd.DataFrame(data, columns = ['Timestamp', 'open', 'high', 'low', 'close', 'volume'])

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df['plus_15'] = df['Timestamp'].shift(1) + dt.timedelta(minutes = 15)

df['valid_time'] = np.where((df['Timestamp'] == df['plus_15']) | (df.index == 0), 1, 0)

print(df[['Timestamp', 'valid_time']])

#output
            Timestamp  valid_time
0 2017-12-22 13:15:00           1
1 2017-12-22 13:30:00           1
2 2017-12-22 13:45:00           1
3 2017-12-22 14:00:00           1
4 2017-12-22 14:15:00           1

So create a new column, plus 15, that looks at the previous timestamp and adds 15 minutes to it. Then create another column, valid time, which compares the timestamp column to the plus 15 column, and marks 1 when they are equal and 0 when they are not.

kjmerf
  • 4,275
  • 3
  • 21
  • 29
0

Can we do something like this?

import pandas as pd
import numpy as np

data = '''\
TimeStamp            open      high       low     close      volume
2017-12-22T13:15:00  12935.00  13200.00  12508.71  12514.91  244.728611
2017-12-22T13:30:00  12514.91  12999.99  12508.71  12666.34  150.457869
2017-12-22T13:45:00  12666.33  12899.97  12094.00  12094.00  198.680014
2017-12-22T14:00:00  12094.01  12354.99  11150.00  11150.00  256.812634
2017-12-22T14:15:00  11150.01  12510.00  10400.00  12276.33  262.217127'''

df = pd.read_csv(pd.compat.StringIO(data), 
                 sep='\s+', parse_dates=['TimeStamp'], index_col=['TimeStamp'])

df['new_time'] = df.index[1:].tolist()+[np.NaN]
# df['new_time'] = np.roll(df.index, -1)  # if last is not first+15min

# use boolean indexing to filter away unwanted rows
df[[(dt2-dt1)/np.timedelta64(1, 's') == 900 
    for dt1,dt2 in zip(df.index.values,df.new_time.values)]]
Anton vBR
  • 18,287
  • 5
  • 40
  • 46