I am trying to plot decisión boundaries for different model. I have come across the following SO post here
I am trying to implement this using the iris
dataset (not the irir3
dataset). I would also like to expand this out so that I can apply it to other ML models (and not just the knn
model). In the link above the autor plots the trained values, I would like to plot the test
values.
In my attempt at replicating that post's answer using the iris
data. I can only get as far as:
iris <- iris %>%
mutate(
Species = factor(Species),
ID = row_number()
)
iris_train = iris %>%
sample_frac(0.75)
iris_test <- anti_join(iris, iris_train, by = "ID")
knn_model <- knn(train = iris_train[, 1:4], iris_test[,1:4],
cl = iris_train$Species, k = 3, prob = TRUE)
prob = attr(knn_model, "prob")
grid <- expand.grid(x=seq(min(iris_train[,1]-1), max(iris_train[,1]+1),
by=0.1),
y=seq(min(iris_train[,2]-1), max(iris_train[,2]+1),
by=0.1))
knnPredGrid <- predict(knn_model, grid)
In which I get:
Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "factor"
The code from the original post which uses the iris3
dataset.
train <- rbind(iris3[1:25,1:2,1],
iris3[1:25,1:2,2],
iris3[1:25,1:2,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
require(MASS)
test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1),
by=0.1),
y=seq(min(train[,2]-1), max(train[,2]+1),
by=0.1))
require(class)
classif <- knn(train, test, cl, k = 3, prob=TRUE)
prob <- attr(classif, "prob")
require(dplyr)
dataf <- bind_rows(mutate(test,
prob=prob,
cls="c",
prob_cls=ifelse(classif==cls,
1, 0)),
mutate(test,
prob=prob,
cls="v",
prob_cls=ifelse(classif==cls,
1, 0)),
mutate(test,
prob=prob,
cls="s",
prob_cls=ifelse(classif==cls,
1, 0)))
require(ggplot2)
ggplot(dataf) +
geom_point(aes(x=x, y=y, col=cls),
data = mutate(test, cls=classif),
size=1.2) +
geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
bins=2,
data=dataf) +
geom_point(aes(x=x, y=y, col=cls),
size=3,
data=data.frame(x=train[,1], y=train[,2], cls=cl))
I am not sure I understand the code correctly. the test
/expand.grid
part just helps create the geom_contour
points for the plot, right? i.e. expands the data in a long format which the data points repeats itself many times and the knn
model just classifies the repeated data points… How can I avoid this and test the knn
model on the iris_test
data and then expand the grid to construct the contour
points.
How can I expand my code to plot only the iris_test
points but use the contour
lines from the training data?
How can I expand this to another model such as randomForests?
Overall, can somebody show me in an easy to follow way how I can replicate the plots on the SO post answers page but using the iris
data and not the iris3
data? I would like to build the knn model using all 4 iris
variables instead of the original posts 2 variables. I would like to expand it to see how different combinations of the iris
variables look when plotted against the decisión boundaries and in the current format it is a Little difficult for me to follow how to do this.
randomForest(factor(Species) ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
data = iris_train)