I am working on converting a project that I currently have programmed in Excel to R. The reason for doing this is that the code includes lots of logic and data which means thats Excel's performance is very poor. So far I have coded up around 50% of this project in R and I am extremely impressed with the performance.
The code I have does the following:
- Loads a 5min time-series data of a stock and adds a
day of the year
column labeleddoy
in this example below.
The OHLC data looks like this:
Date Open High Low Close doy
1 2015-09-21 09:30:00 164.6700 164.7100 164.3700 164.5300 264
2 2015-09-21 09:35:00 164.5300 164.9000 164.5300 164.6400 264
3 2015-09-21 09:40:00 164.6600 164.8900 164.6000 164.8900 264
4 2015-09-21 09:45:00 164.9100 165.0900 164.9100 164.9736 264
5 2015-09-21 09:50:00 164.9399 165.0980 164.8200 164.8200 264
- Converts that data to a table called df
df <- tbl_df(DIA_5)
- Using mainly
plyr
with hint ofTTR
it filters through the data creating a set of 10 new variables in a new data frame calleddata
. See below:
data <- structure(list(doy = c(264, 265, 266, 267, 268, 271, 272, 11,12, 13),
Date = structure(c(1442824200, 1442910600, 1442997000,1443083400,
1443169800, 1443429000, 1443515400, 1452504600,
1452591000,1452677400), class = c("POSIXct", "POSIXt"), tzone = ""),
OR_High = c(164.71,162.96, 163.38, 161.37, 163.91, 162.06, 160.22,
164.5, 165.23,165.84), OR_Low = c(164.37, 162.62, 162.98, 161.06,
163.57, 161.66,159.7, 164.06, 164.84, 165.4), HOD = c(165.56, 163.36,
163.38,162.24, 164.43, 162.06, 160.96, 164.5, 165.78, 165.84), LOD =
c(165.22,163.1, 162.98, 161.95, 164.24, 161.66, 160.75, 164.06,
165.56,165.4), Close = c(164.92, 163.02, 162.58, 161.85, 162.94,
159.84,160.19, 163.83, 165.02, 161.38), Range =
c(0.340000000000003,0.260000000000019, 0.400000000000006,
0.29000000000002, 0.189999999999998,0.400000000000006,
0.210000000000008, 0.439999999999998, 0.219999999999999,0.439999999999998),
`A-val` = c(NA, NA, NA, NA, NA, NA, NA,
0.0673439999999994,0.0659639999999996, 0.0729499999999996),
`A-up` = c(NA, NA, NA,NA, NA, NA, NA, 164.567344, 165.295964,
165.91295), `A-down` = c(NA,NA, NA, NA, NA, NA, NA, 163.992656,
164.774036, 165.32705)), .Names = c("doy","Date", "OR_High", "OR_Low",
"HOD", "LOD", "Close", "Range","A-val", "A-up", "A-down"),
row.names = c(1L, 2L, 3L, 4L, 5L,6L, 7L, 78L, 79L, 80L),
class = "data.frame")
The next part is where it gets complicated. What I need to do is to analyse the high and low prices of each 5 minute bar of the day in relation to my A-up & A-down and close values as seen in the table. What I am looking for is to be able to compute a score for the day depending on the time spent above the A-up level or below the A-down level.
The way I got by this in Excel was to index each 5 minute high & low price of the time series then used logic to score the activity in that 5min time slice. If the low was > A-up level it was given a 1 and - 1 if the high was < A-down. For the scoring if price stays > A-up level or < A-down level for greater than 30 mins I score it a 2 0r -2. This was achieved by using a running 5 period sum of the results of the and if one had more than 5 ones I knew that price had stayed > the A-up level etc then I would score it a 2.
For the days scoring I need to know the following;
- Did price stay above or below and A level for > 30 minutes or fail by spending < 30 minutes there?
- If price went above and below both levels in one day, which level did it break first?
So after a long winded intro my question. Does anyone out there have a good idea of the best way to go about coding this. I don't need specific code but moreover what packages may help to accomplish this. As I mentioned above my reason for switching to R was mainly for speed so whatever code used must be efficient. When I have this coded I intend on programming a loop so that it can analyse several hundred instruments.
Thanks.