2

I have a RandomForest model trained with the caret package that contains both numeric and categorical predictors. I am attempting to use this trained model to make predictions on a new dataset that is a rasterStack containing one layer for each predictor. I have converted the categorical raster layer to a factor using the ratify function in the raster package, as well as added character strings corresponding to the training set syntax by adding a raster attribute table (RAT), but when I predict I am getting the following error:

# Error in predict.randomForest(modelFit, newdata) : 
# Type of predictors in new data do not match that of the training data. 

I think I might be mis-formulating the RAT somehow, or else I am misunderstanding the functionality of the RAT. Below is a minimal reproducible example. Any thoughts on what is going wrong?

require(caret)
require(raster)

set.seed(150)
data("iris")

# Training dataset
iris.x<-iris[,1:4]
iris.x$Cat<-"Low"
iris.x$Cat[1:60]<-"High"
iris.x$Cat<-as.factor(as.character(iris.x$Cat))
iris.y<-iris$Species

# Train RF model in Caret
ctrl<-trainControl("cv", num=5, p = 0.9)

mod<- train(iris.x,iris.y, 
              method="rf",
              trControl=trainControl(method = "cv"))

# Create raster stack prediction dataset
r <- raster(ncol=10, nrow=5)
tt <- sapply(1:4, function(x) setValues(r,  round(runif(ncell(r),1,5))))

#Categorical raster layer with RAT
r_cat<-raster(ncol=10, nrow=5)
r_cat[1:25]<-1
r_cat[26:50]<-2
ratr_cat <- ratify(r_cat)
rat <- levels(ratr_cat)[[1]]
rat$PCN <- c(1,2)
rat$PCN_level <- c('Low','High')
levels(ratr_cat) <- rat

#Stack raster layers
t.stack <- stack(c(tt,ratr_cat),RAT = TRUE)

#Make sure names in stack match training dataset
names(t.stack)<-c('Sepal.Length','Sepal.Width', 'Petal.Length', 'Petal.Width','Cat')

#Ensure that categorical layer still has RAT and is a factor
t.stack[['Cat']] #yep
is.factor(t.stack[['Cat']]) #yep

#Predict new data using model
mod_pred <- predict(t.stack, mod)
loki
  • 9,816
  • 7
  • 56
  • 82
jlab
  • 252
  • 2
  • 18
  • Can you please share the output of `print(RF_model)`? Though, this is not a [minimal](https://stackoverflow.com/help/mcve) nor a [reproducible](https://stackoverflow.com/q/5963269/3250126) example. Please try to make things simple. – loki Feb 02 '18 at 12:57
  • 1
    @loki, please see my edit for a minimal reproducible example. Thanks – jlab Feb 02 '18 at 13:48

1 Answers1

2

The factor RasterLayer (Attribute Layer) seems to be (or be handled like) an ordered factor. So you just have to train the model with an ordered vector. You can achieve this changing one line:

iris.x$Cat<- ordered(as.character(iris.x$Cat), levels = c("Low", "High"))
loki
  • 9,816
  • 7
  • 56
  • 82
  • @loci, Ah yes, I see it works now, thanks. For clarification, `is.ordered(ratr_cat)` returns `FALSE`, so how can I determine if a `RasterLayer` is ordered for future reference. Also, I have other categorical variables with numerous levels that I would like to include in the model, but they are not ordinal factors. I would just convert them to unordered with `factor( x , ordered = FALSE )`, but the `RasterLayer` object is not reporting that it is ordered to begin with? – jlab Feb 02 '18 at 19:25
  • neither does `is.ordered(t.stack[['Cat']]@data@isfactor)` nor `is.ordered(t.stack[['Cat']]@data@attributes[[1]]$PCN_test2)` return `TRUE`. Strange, how is the train function reading this `RasterLayer` as ordered? – jlab Feb 03 '18 at 08:22
  • That is, in deed, strange. Unfortunately I also found nothing which explains that. The only idea would be to extract the order of the levels from `levels(t.stack[['Cat']])`. – loki Feb 03 '18 at 17:33