I am running this function to do n-fold cross-validation. The misclassification rate does not vary over folds, e.g. if I run 10 or 50. I am also getting a warning:
"Warning message:
'newdata' had 19 rows but variables found have 189 rows"
If I run the code without being part of a function, it is doing want I want -> e.g. for folds==1, it is pulling out 10%, running the model on 90% of the data, and predicting the other 10%. Does anyone have any ideas as to why it is not showing variation by variable and the number of folds?
library("MASS")
data(birthwt)
data=birthwt
n.folds=10
jim = function(x,y,n.folds,data){
for(i in 1:n.folds){
folds <- cut(seq(1,nrow(data)),breaks=n.folds,labels=FALSE)
testIndexes <- which(folds==i,arr.ind=TRUE)
testData <- data[testIndexes, ]
trainData <- data[-testIndexes, ]
glm.train <- glm(y ~ x, family = binomial, data=trainData)
predictions=predict(glm.train, newdata =testData, type='response')
pred.class=ifelse(predictions< 0, 0, 1)
}
rate=sum(pred.class!= y) / length(y)
print(head(rate))
}
jim(birthwt$smoke, birthwt$low, 10, birthwt)