0

I am trying to plot decisión boundaries for different model. I have come across the following SO post here

I am trying to implement this using the iris dataset (not the irir3 dataset). I would also like to expand this out so that I can apply it to other ML models (and not just the knn model). In the link above the autor plots the trained values, I would like to plot the test values.

In my attempt at replicating that post's answer using the iris data. I can only get as far as:

iris <- iris %>% 
  mutate(
    Species = factor(Species),
    ID = row_number()
  )

iris_train = iris %>% 
  sample_frac(0.75)

iris_test <- anti_join(iris, iris_train, by = "ID")

knn_model <- knn(train = iris_train[, 1:4], iris_test[,1:4],
                 cl = iris_train$Species, k = 3, prob = TRUE)

prob = attr(knn_model, "prob")

grid <- expand.grid(x=seq(min(iris_train[,1]-1), max(iris_train[,1]+1),
                          by=0.1),
                    y=seq(min(iris_train[,2]-1), max(iris_train[,2]+1), 
                          by=0.1))

knnPredGrid <- predict(knn_model, grid)

In which I get:

Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "factor"

The code from the original post which uses the iris3 dataset.

train <- rbind(iris3[1:25,1:2,1],
               iris3[1:25,1:2,2],
               iris3[1:25,1:2,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))

require(MASS)

test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1),
                          by=0.1),
                    y=seq(min(train[,2]-1), max(train[,2]+1), 
                          by=0.1))

require(class)
classif <- knn(train, test, cl, k = 3, prob=TRUE)
prob <- attr(classif, "prob")

require(dplyr)

dataf <- bind_rows(mutate(test,
                          prob=prob,
                          cls="c",
                          prob_cls=ifelse(classif==cls,
                                          1, 0)),
                   mutate(test,
                          prob=prob,
                          cls="v",
                          prob_cls=ifelse(classif==cls,
                                          1, 0)),
                   mutate(test,
                          prob=prob,
                          cls="s",
                          prob_cls=ifelse(classif==cls,
                                          1, 0)))

require(ggplot2)
ggplot(dataf) +
  geom_point(aes(x=x, y=y, col=cls),
             data = mutate(test, cls=classif),
             size=1.2) + 
  geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
               bins=2,
               data=dataf) +
  geom_point(aes(x=x, y=y, col=cls),
             size=3,
             data=data.frame(x=train[,1], y=train[,2], cls=cl))

I am not sure I understand the code correctly. the test/expand.grid part just helps create the geom_contour points for the plot, right? i.e. expands the data in a long format which the data points repeats itself many times and the knn model just classifies the repeated data points… How can I avoid this and test the knn model on the iris_test data and then expand the grid to construct the contour points.

How can I expand my code to plot only the iris_test points but use the contour lines from the training data?

How can I expand this to another model such as randomForests?

Overall, can somebody show me in an easy to follow way how I can replicate the plots on the SO post answers page but using the iris data and not the iris3 data? I would like to build the knn model using all 4 iris variables instead of the original posts 2 variables. I would like to expand it to see how different combinations of the iris variables look when plotted against the decisión boundaries and in the current format it is a Little difficult for me to follow how to do this.

randomForest(factor(Species) ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
             data = iris_train)
user8959427
  • 2,027
  • 9
  • 20
  • Thats strange, it doesn't for me. The second part of the code was taken from the answer given here: https://stackoverflow.com/questions/31234621/variation-on-how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-f – user8959427 Jan 23 '20 at 19:08
  • You may check [here](https://stackoverflow.com/questions/50632410/how-do-i-use-knn-model-for-new-data-in-r) – akrun Jan 23 '20 at 19:37

0 Answers0