0

We created a table in R with values from the S&P500 and added rows like the simple 10 Days Moving Average. We set the NA-values to 0. Example:

myStartDate <- '2020-01-01' 
myEndDate   <- Sys.Date()
Dataset$SMA10 <- SMA(Dataset[,"Close"], 10) 
Dataset$SMA10 <- as.numeric(Dataset$SMA10)
Dataset$SMA10[is.na(Dataset$SMA10)] <- 0

Our goal is to create a random forest model. Therefore we split the data into a train and a valid data:

set.seed(100) 
train <- sample(nrow(Dataset), 0.5*nrow(Dataset), replace = FALSE) 
TrainSet <- Dataset [train,] 
ValidSet <- Dataset [-train,] 

Now if we want to generate the model with following code;

model1 <- randomForest(SMA10~.,data=TrainSet, mtry=5, importance=TRUE,ntree=500) 
print(model1) 

we get this error message: Error in x[, i] <- frame[[i]] : number of items to replace is not a multiple of replacement length

By looking up this error in the forum, we found that this is related with NA-Values. Therefore we are a little confused, because we have no NA-Values in our table. Can you tell us what we are doing wrong? Thank you very much in advance.

  • It looks like you're new to SO; welcome to the community! The answer to your question is probably specific to your data. To answer your question, make your question reproducible. Check it out: [making R reproducible questions](https://stackoverflow.com/q/5963269). – Kat May 09 '22 at 22:42
  • I get the feeling that your difficulty might live with your `size = 0.5*nrow(Dataset)`, if say, nrow is odd. – Chris May 10 '22 at 03:52

1 Answers1

0

Some functions such as ADX from TTR generated multiple columns like "ADX[,"sign"], ADX[,"ADX"] .... ", These columns are not recognized in the randomForest model, so this error message was showed up.

If you want to correct this error, you should set up this value in your dataframe as :

data = data.frame(... ADX = ADX[,.$[2:4])[,"ADX"]

MACD function should set up as well.

I think you used "quantmod" package or "TTR", so you should carefully inspect your dataframe that is inputted on your randomforest model and then, correct your dataframe properly

ALLEN
  • 1
  • 1