In my first attempts in using R I wrote two functions that are not very performant I guess and would appreciate if I can receive some hints on how to make them more performant (vectorized). Both functions come with "test case" at the end.
The first function takes two time series xts objects x and y and returns a series which contains data on how many days x is higher/lower than y.
require('xts')
require('quantmod')
countDaysBelowOrAbove <- function(x, y) {
x <- try.xts(x, error=as.matrix)
y <- try.xts(y, error=as.matrix)
if(is.xts(x) && is.xts(y)) {
xy <- cbind(x,y)
} else {
xy <- cbind( as.vector(x), as.vector(y) )
}
# Count NAs, ensure they're only at beginning of data, then remove.
xNAs <- sum( is.na(x) )
yNAs <- sum( is.na(y) )
NAs <- max( xNAs, yNAs )
if( NAs > 0 ) {
if( any( is.na(xy[-(1:NAs),]) ) ) stop("Series contain non-leading NAs")
}
resultDaysLower <- x
resultDaysHigher <- x
resultDaysLower[!is.na(resultDaysLower)]<-0
resultDaysHigher[!is.na(resultDaysHigher)]<-0
series<-cbind(xy, resultDaysLower, resultDaysHigher)
colnames(series) <- c(names(xy), "cumDaysLower", "cumDaysHigher")
daysLower = 0
daysHigher = 0
for (i in 1:NROW(xy)) {
if (!(is.na(series[,1][i]) | is.na(series[,2][i]))) {
if (series[,1][i] >= series[,2][i]) {
daysLower = 0
daysHigher = daysHigher + 1
}
else {
daysHigher = 0
daysLower = daysLower + 1
}
}
else {
daysLower = 0
daysHigher = 0
}
series$cumDaysLower[i] = daysLower
series$cumDaysHigher[i] = daysHigher
}
return(series)
}
getSymbols("SPY", from='2005-01-01')
SPYclose = Cl(SPY)
getSymbols("QQQQ", from='2005-01-01')
QQQQclose = Cl(QQQQ)
testData = countDaysBelowOrAbove(SPYclose, QQQQclose)
The second function I would appreciate help with performance optimization is below. The function takes as parameter an xts object series and an xts object representing lengths of interval to calculate minimum of series at a specified time. The function returns calculated minimum of series with specified window for minimum calculation set in lengths.
minimumWithVaryingLength<-function(series, lengths) {
series <- try.xts(series, error=as.matrix)
lengths <- try.xts(lengths, error=as.matrix)
if(is.xts(series) && is.xts(lengths)) {
serieslengths <- cbind(series,lengths)
} else {
serieslengths <- cbind( as.vector(series), as.vector(lengths) )
}
# Count NAs, ensure they're only at beginning of data, then remove.
seriesNAs <- sum( is.na(series) )
lengthsNAs <- sum( is.na(lengths) )
NAs <- max( seriesNAs, lengthsNAs )
if( NAs > 0 ) {
if( any( is.na(serieslengths[-(1:NAs),]) ) ) stop("Series contain non-leading NAs")
}
result <- series
result[!is.na(result)]<-0
for (i in 1:NROW(serieslengths)) {
if (lengths[i] > 0) {
result[i] <- runMin(series, n=lengths[i], cumulative=FALSE)[i]
}
else {
result[i] <- 0
}
}
return(result)
}
getSymbols("SPY", from='2005-01-01')
SPYclose = Cl(SPY)
getSymbols("QQQQ", from='2005-01-01')
QQQQclose = Cl(QQQQ)
numDaysBelow = countDaysBelowOrAbove(SPYclose, QQQQclose)
test = minimumWithVaryingLength(SPYclose, numDaysBelow)
Thanks in advance for your kind help.
Kind regards, Samo.