0

I'm a beginner in R, currently I'm trying the customer churn data, I built a classification model, and then tried to use cross validation to evaluate our model's performance, but there is something wrong with my code as below:

"""

setwd("H:/R")
source("cutoff-plot.R")
source("classification-metrics.R")
library(tree)
negative.label <- "no"
positive.label <- "yes"
class.labels <- c(negative.label,positive.label)
data.set <- read.csv("churn.csv")
data.set$Churn <- factor(
as.numeric(data.set$Churn==positive.label),
levels=0:1, labels=class.labels)
f <- Churn ~ .
n.folds <- 10
fold.idx <- sample(rep(1:n.folds, length=nrow(data.set)))
p.linear <- rep(NA, nrow(data.set))
p.tree <- rep(NA,nrow(data.set))
for (k in 1:n.folds) {
  fold <- which (fold.idx == k)
  linear.model <- glm(f, data.set[-fold,],family=binomial)
  tree.model <- tree(f, data.set[-fold,])
  p.linear[fold] <- predict(linear.model,data.set[fold, ]) 
  p.tree[fold] <- predict(tree.model,data.set[fold, ])
}
yhat.linear <- compute.yhat(p.linear,threshold=0.14)
yhat.tree <- compute.yhat(p.tree,threshold=0.08)
y <- data.set$Churn
linear.stats <- summary.stats(y, yhat.linear)
tree.stats <- summary.stats(y, yhat.tree)
linear.stats
tree.stats
cutoff.plot(p.linear,y)
cutoff.plot(p.tree,y)

"""

The problem is after I running the for loop for (k in 1:n.folds) {}, there are some

Warning messages:

1: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

2: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

3: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

4: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

5: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

6: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

7: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

8: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

9: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
   number of items to replace is not a multiple of replacement length

10: In p.tree[fold] <- predict(tree.model, data.set[fold, ]) :
    number of items to replace is not a multiple of replacement length
ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
  • I fixed the for loop now, there is no more warning messages or errors. but still not sure after the for loop how can i get a new confusion matrix? for (k in 1:n.folds) { fold <- which (fold.idx == k) linear.model <- glm(f, data.set, family=binomial) tree.model <- tree(f, data.set) p.linear[fold] <- predict(linear.model, data.set[fold, ], type="response") p.tree[fold] <- predict(tree.model,data.set[fold, ])[ ,2] } – Darsolation Oct 03 '15 at 05:44
  • This isn't exactly reproducible. Can you encapsulate all the relevant code and data into your question that demonstrates what you're after? – Roman Luštrik Oct 03 '15 at 07:13
  • Thx @RomanLuštrik, can i plz have you email so that i can send all scripts to you, actually what im trying to do is, I've already built aclassification model in both tree and linear. Then I want to do the cross validation. this is the code that i generated for classification model: – Darsolation Oct 04 '15 at 01:06
  • library(tree) negative.label <- "no" positive.label <- "yes" class.labels <- c(negative.label,positive.label) data.set <- read.csv("churn.csv") data.set$Churn <- factor( as.numeric(data.set$Churn==positive.label), levels=0:1, labels=class.labels) linear.model <- glm(Churn ~ ., data.set, family=binomial) tree.model <- tree(Churn ~ ., data.set) p.linear <- predict(linear.model, data.set, type="response") p.tree <- predict(tree.model,data.set)[ ,2] – Darsolation Oct 04 '15 at 01:06
  • Check this, i think i described more clear there [link](http://stats.stackexchange.com/questions/175383/cross-validation-claasification-for-cutomer-churn-in-r) – Darsolation Oct 04 '15 at 01:27
  • Include all the relevant code and data (preferably simulated) in this post. See [this post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for a few tips on how to do that. – Roman Luštrik Oct 04 '15 at 14:14

0 Answers0