I'm a beginner in R, currently I'm trying the customer churn data, I built a classification model, and then tried to use cross validation to evaluate our model's performance, but there is something wrong with my code as below:
"""
setwd("H:/R")
source("cutoff-plot.R")
source("classification-metrics.R")
library(tree)
negative.label <- "no"
positive.label <- "yes"
class.labels <- c(negative.label,positive.label)
data.set <- read.csv("churn.csv")
data.set$Churn <- factor(
as.numeric(data.set$Churn==positive.label),
levels=0:1, labels=class.labels)
f <- Churn ~ .
n.folds <- 10
fold.idx <- sample(rep(1:n.folds, length=nrow(data.set)))
p.linear <- rep(NA, nrow(data.set))
p.tree <- rep(NA,nrow(data.set))
for (k in 1:n.folds) {
fold <- which (fold.idx == k)
linear.model <- glm(f, data.set[-fold,],family=binomial)
tree.model <- tree(f, data.set[-fold,])
p.linear[fold] <- predict(linear.model,data.set[fold, ])
p.tree[fold] <- predict(tree.model,data.set[fold, ])
}
yhat.linear <- compute.yhat(p.linear,threshold=0.14)
yhat.tree <- compute.yhat(p.tree,threshold=0.08)
y <- data.set$Churn
linear.stats <- summary.stats(y, yhat.linear)
tree.stats <- summary.stats(y, yhat.tree)
linear.stats
tree.stats
cutoff.plot(p.linear,y)
cutoff.plot(p.tree,y)
"""
The problem is after I running the for loop for (k in 1:n.folds) {}
, there are some
Warning messages:
1: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
2: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
3: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
4: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
5: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
6: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
7: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
8: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
9: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length
10: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
number of items to replace is not a multiple of replacement length