0

I have a following time-series price data:

18/01/2008  7.4811
22/01/2008  7.5267
31/01/2008  7.8289
01/02/2008  7.82
...
30/10/2008  7.81
31/10/2008  7.75

I build a function calVariation to calculate the variation of prize as: variation = log(data/data[1,1]).

  • calVariation starts from Line 1 of the data, i.e., calculate variations for data[1:nrow(data),], then find in the variation result array the first value that is less than a threshold of 5%.
  • If nothing found, the function calVariation should run again but start from the next line of the data, i.e., compute variations for data[2:nrow(data),]
  • If it finds that the variation at line n is less than threshold 5%, it will save the column of the original data from Line 1 to Line n to one column of a matrix mat. Now the original data will be reduced to data[n:nrow(data),] and become the input for calVariation to calculate in the next step.

Following is my code.

pathway <- 'C:/'
decimal <- ","
threshold <- -0.05
database <- as.matrix(read.csv(paste(pathway,"Data_origin.csv",sep=""), header = FALSE, sep = ";", dec = decimal))

data_p <- as.matrix(database[,2])
data_p <- as.matrix(as.numeric(data_p))
rownames(data_p) <- database[,1]

calVariation <- function(mData, threshold){ 
  if(nrow(mData) > 1) {  
    vari <- log(mData/mData[1,1])

    if (any(vari < threshold) == FALSE) { # Not found any value < -5
      mData <- as.matrix(mData[2:nrow(mData),])
      mData <- calVariation(mData, threshold)
    }

    else { # Found value < -5
      threshold_id <- min(which(vari < threshold))
      mData <- as.matrix(mData[1:threshold_id, ])
    }
  } else (
    mData <- NULL
  ) 

  return(mData)
}


data <- data_p
mat <- NULL
rowid <- 0

while (nrow(data) > 1 && is.null(data) == FALSE) {
  temp <- matrix(NA, nrow(data_p), 2)
  data <- calVariation(data, threshold)

  if (is.null(data) == FALSE) {
    temp[1:nrow(data), 1] <- rownames(data) 
    temp[1:nrow(data), 2] <- data 
    rowid <- rowid + nrow(data)     
    mat <- cbind(mat, temp)
    data <- as.matrix(data_p[rowid:nrow(data_p),])    
  } else {
    break()
  }

}

It returns this error: Error in if (any(vari < threshold) == FALSE) { : missing value where TRUE/FALSE needed. I guess that when vari becomes NA this error happens but I tried to use something like is.na function to get rid of this but it didn't work out.

The original data for the test can be found here. Many thanks in advance.

Tung
  • 26,371
  • 7
  • 91
  • 115
Trung
  • 3
  • 7
  • Please share your data using `dput()` so others can help. See more here: [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Tung Jul 22 '18 at 02:24

1 Answers1

0

Your actual error is in the while block

Error in if (any(vari < threshold) == FALSE) {: missing value where TRUE/FALSE  needed\n"
Error in data_p[rowid:nrow(data_p), ] : subscript out of bounds

One suggestion in future when diagnosing hard to debug errors, use tryCatch, rewritten calVariation()

calVariation <- function(mData, threshold){ 


tryCatch( {if(nrow(mData) > 1) {  
vari <- log(mData/mData[1,1])


if ( any(vari < threshold) == FALSE) { # Not found any value < -5
  mData <- as.matrix(mData[2:nrow(mData),])
  mData <- calVariation(mData, threshold)
}

else { # Found value < -5
  threshold_id <- min(which(vari < threshold))
  mData <- as.matrix(mData[1:threshold_id, ])
}
} else {
mData <- NULL
} },error = function(err) {

# print the error
print(paste("error:  ",err))

} )
  return(mData)
}
Aji
  • 133
  • 1
  • 5