I have OHLC (Open/High/Low/Close)
data which we can get using Finance API and all.
I want to create a target indicator (-1,0,1)
on which I will build stock classification model.
To create this target variable.
I need to create another indicator, log(tomorrow's CLOSE/today's CLOSE)
Which will give me value in (-inf to inf).
Now, I want to create labels=c(-1, 0, 1) from breaks=c(-Inf, range_start, range_end, Inf) of log(tomorrow's CLOSE/today's CLOSE).
My first question is to create this target variable without looking into the future data, as my formula log(tomorrow's CLOSE/today's CLOSE)
looks into the future, which is wrong, I want to shift the dataframe/inputs backward by one row and treat today as tomorrow and so on.
and then, calculate the target category, based on range_start, range_end and breaks I will define, the -1, 0,1 .
My 2nd question is how can i define it in best manner, this value, I am taking this as -0.0015,0.0015 as of now.
need some comments and suggestions here, thanks.
masterDF_close <- masterDF %>% dplyr::select('Date', 'Close')
# create a one-row matrix the same length as data
temprow <- matrix(c(rep.int(NA,length(masterDF))),nrow=1,ncol=length(masterDF))
# make it a data.frame and give cols the same names as data
newrow <- data.frame(temprow)
colnames(newrow) <- colnames(masterDF)
# rbind the empty row to data
masterDF <- rbind(newrow,masterDF)
###View(masterDF)
temprow2 <- matrix(c(rep.int(NA,length(masterDF_close))),nrow=1,ncol=length(masterDF_close))
# make it a data.frame and give cols the same names as data
newrow2 <- data.frame(temprow2)
colnames(newrow2) <- colnames(masterDF_close)
# rbind the empty row to data
masterDF_close <- rbind(masterDF_close, newrow2)
masterDF['Close_unshifted'] = masterDF_close$Close
###View(masterDF)
# Shifting data backwards, assuming today Close as tomorrow Close and yesterday Close as today Close
# close <- masterDF$Close
# lead_close <- lag(close, k = -1)
#
# close[1:10]
# lead_close[1:10]
#
# log(close/lead_close)
#
# plot(log(close/lead_close))
masterDF['TargetIndicator'] <- log(masterDF$Close_unshifted/masterDF$Close)
###View(masterDF)
masterDF = masterDF[-1,]
masterDF$TargetIndicator[is.na(masterDF$TargetIndicator)] <- 0
masterDF_ <- masterDF %>% mutate(category=cut(TargetIndicator,
breaks=c(-Inf, range_start, range_end, Inf),
labels=c(-1, 0, 1)))
These are two operations, I am doing on the code.