I have 5 years of S & P 500 data that I’m trying to group into specific time chunks to run some analysis on. My data is in 5 minute increments. After reading it into a DataFrame called dated, I thought I could groupby it into chunks, encompassing consecutive rows, by the time increments between the Globex open and close for each trading day. The Globex open is at 3:00PM (15:00) PST and the close is 1:15PM PST (13:15), 22.25 hours later. So I would like to chunk the data from the 3:00PM open THROUGH to the 1:15PM close--that's approximately 275 rows per chunk.
The difficulty is that a ‘trading day’ spans 2 different dates, for eaxmple, a session opens on Sunday 3-6-2016 at 15:00 and closes on Monday, 3-7-2016 at 13:15. I’m not able to groupby the ‘Date’ column because of course all of 3-6 would be in one chunk and 3-7 in another, when I need the data chunk to span both dates, in order to get the whole Globex day in one chunk.
Being relatively new to both Pandas and Python I don’t know what method to use to group the data into the chunks that I want. Once the data is segregated, I’d like to extract the high and low of each session/chunk into separate columns and also a column for just the 15:05 open price.
Here is a sample of what my data looks like:
Date Time Open High Low Close Up Down
0 2015-08-03 15:05 2073.50 2074.00 2073.25 2073.25 210 563
1 2015-08-03 15:10 2073.25 2073.25 2072.25 2072.75 118 632
2 2015-08-03 15:15 2072.75 2072.75 2072.25 2072.50 132 85
3 2015-08-03 15:20 2072.50 2072.75 2072.25 2072.50 95 312
4 2015-08-03 15:25 2072.50 2074.00 2072.50 2073.50 372 264
Originally, the ‘Date’ column values looked like this: 8/3/2015. I thought that it may not be read as an actual date object, so I changed the values using to_datetime() to get the ‘Date’ column values to read like an actual date object as seen in my sample DataFrame above.
dated['Date'] =pd.to_datetime(dated['Date'])
When I tried to change the values in the ‘Time’ column using to_datetime(), it successfully changes the time from 15:05 to 15:05:00, but it also added the day, so it looked like this: ‘2016-03-05 15:05:00’, the problem being that it used today’s date for the date. This obviously won’t work for me as my data is historical and the dates and times are references to historical prices.
The reason I was trying to change the ‘Time’ column to a datetime object is I thought I would be able to slice it into the chunks that I need during a groupby operation:
dated = dated['Date'].groupby(dated['15:05' : '13:20'])
This produced and error:
IndexError: invalid slice
So I’d appreciate any help solving this problem--pointing me in the correct areas of research. I've been essentially reading the pandas documentation piece by piece, trying out different methods, but as I'm not sure what steps to start with, I've been picking topics at random to read and not finding the answer.
Thanks, Ana