I am trying to use the predict function to predict the values of a logistic regression and I am getting the incorrect number of rows. This question has already been asked R Warning: newdata' had 15 rows but variables found have 22 rows
and I have tried the approach but I still get the error. Here is the code
# Split as training and test sets
train_idx <- trainTestSplit(adult,trainPercent=75,seed=1111)
train <- adult[train_idx, ]
test <- adult[-train_idx, ]
xtrain <- train[,1:7]
ytrain <- train[,8]
xtrain1 <- dummy.data.frame(xtrain, sep = ".")
xtrain2 <- as.matrix(xtrain1)
xtest <- test[,1:7]
ytest <- test[,8]
xtest1 <- dummy.data.frame(xtest, sep = ".")
xtest2 <- as.matrix(xtest1)
fit=glm(ytrain~xtrain2,family=binomial)
a=predict(fit,newdata=xtrain1,type="response")
b=ifelse(a>0.5,1,0)
confusionMatrix(b,ytrain)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 16065 3157
1 968 2430
Accuracy : 0.8176
95% CI : (0.8125, 0.8227)
# Predict with test dataframe
a=predict(fit,xtest1,type="response")
: 'newdata' had 7541 rows but variables found have 22620 rows
2: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
prediction from a rank-deficient fit may be misleading
>
I also tried
names(xtest1)=names(xtrain1) and
a=predict(fit,xtest1,type="response")
They were the same anyway but I get the same error. This is an issue that is very counter intuitive. Please help...