0

I want to run a logreg regression. I obtain the following error after running my code on R:

Something is wrong; all the Accuracy metric values are missing:

    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :9     NA's   :9    
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 19 warnings (use warnings() to see them)

Here is my code:

## Data
donner <- read.delim("http://web.as.uky.edu/statistics/users/pbreheny/760/data/donner.txt")
set.seed(1234)
library(caret)
donner$Age <- as.numeric(donner$Age)
donner$Status <- as.factor(donner$Status)  
donner$Sex <- as.numeric(donner$Sex) 
splitIndex <- createDataPartition(donner$Status, p = .80, list = FALSE, times = 1)
trainDF <- donner[splitIndex,]
testDF <- donner[-splitIndex,]
ctrl <- trainControl(method = "cv", number = 2)
logregmodel <- train(Status ~ ., data = donner, method = "logreg", trControl = ctrl)

EDIT 1:

I changed the status to binary (0 and 1) and I still have some errors. Here is the new code:

## Data
donner <- read.delim("http://web.as.uky.edu/statistics/users/pbreheny/760/data/donner.txt")
set.seed(1234)
library(caret)
donner$Age <- as.numeric(donner$Age)
donner$Status <- as.integer(donner$Status)-1  
donner$Sex <- as.numeric(donner$Sex)-1 
splitIndex <- createDataPartition(donner$Status, p = .80, list = FALSE, times = 1)
trainDF <- donner[splitIndex,]
testDF <- donner[-splitIndex,]
ctrl <- trainControl(method = "cv", number = 2)
donner$Status <- as.factor(donner$Status)
logregmodel <- train(Status ~ ., data = donner, method = "logreg", trControl = ctrl)
Hack-R
  • 22,422
  • 14
  • 75
  • 131
Louis-Math
  • 11
  • 5
  • Are you sure you are fitting the model right? If you check the error, it also says non-binary arguments among predictors https://github.com/topepo/caret/blob/master/RegressionTests/Code/logreg.R – Sumedh Jun 22 '16 at 03:33
  • I want to fit a logistic regression model. The outputs can only take two values. – Louis-Math Jun 22 '16 at 03:50
  • You are correct about the output. But look carefully at @ Hack-R's code, the predictor `Age` is changed to binary. I don't see that in your edit. – Sumedh Jun 22 '16 at 04:26
  • The problem is that I lose some information if I don't keep the age as numeric. Normally, I should be able to use continuous variables for logistic regression. How can I do this? – Louis-Math Jun 22 '16 at 05:15
  • 1
    Oh okay so you didn't really want a `logreg` (LOGIC regression) you just wanted a logit model. – Hack-R Jun 23 '16 at 15:46

2 Answers2

3

Just needed to fix your data. Logic Regression -- which is what I'm assuming you want, since you called the logic regression (logreg) method and this entire question is aside from the point if you're wanting something else like logit model, which would never give you the error in the first place -- is for binary variables only and it doesn't understand that 1's and 2's can represent binary data. It wants literal 0's and 1's.

donner <- read.delim("http://web.as.uky.edu/statistics/users/pbreheny/760/data/donner.txt")
set.seed(1234)
library(caret)
donner$Age <- as.numeric(donner$Age)
donner$Status <- as.factor(donner$Status)  
donner$Sex <- as.numeric(donner$Sex) 
splitIndex <- createDataPartition(donner$Status, p = .80, list = FALSE, times = 1)
trainDF <- donner[splitIndex,]
testDF <- donner[-splitIndex,]
ctrl <- trainControl(method = "cv", number = 3)
donner$Status <- as.character(donner$Status)
donner$Status[!donner$Status == "Survived"] <- 0
donner$Status[donner$Status == "Survived"] <- 1
donner$Age_gr_mean <- 0
donner$Age_gr_mean[donner$Age_gr_mean > mean(donner$Age)] <- 1
donner$Age <- NULL
donner$Status <- as.numeric(donner$Status)
donner$Sex[donner$Sex == 2] <- 0
logregmodel <- train(Status ~ ., data = donner, method = "logreg", trControl = ctrl)
Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • I tried to run your code and my R console froze. However, I tried to implement your suggestion and I still have an error. I will edit the question. – Louis-Math Jun 22 '16 at 04:01
  • @user6496985 I didn't have any errors; the model completed. If your console froze that's because you need to increase CPU/RAM. There are a number of ways to do this. Make sure you're using 64-bit. I ran this on an 8GB RAM, dualcore CPU laptop (while watching a movie on Google Play) so I don't them the resource requirements are too much for most laptops with proper settings in R to maximize allowable RAM usage. – Hack-R Jun 22 '16 at 04:24
1

I've personnaly never used the "logreg" method. It also seems that some lines are useless. Here is my suggestion using "glm" as a method.

## Data
donner <- read.delim("http://web.as.uky.edu/statistics/users/pbreheny/760/data/donner.txt")

set.seed(1234)
library(caret)

donner$Age <- as.numeric(donner$Age)
donner$Status <- as.factor(donner$Status)
donner$Sex <- as.numeric(donner$Sex)-1 

splitIndex <- createDataPartition(donner$Status, p = .80, list = FALSE, times = 1)
trainDF <- donner[splitIndex,]
testDF <- donner[-splitIndex,]

ctrl <- trainControl(method = "cv", number = 3)
logregmodel <- train(Status ~ ., data = trainDF, method = "glm", family='binomial', trControl = ctrl)

summary(logregmodel)
Hack-R
  • 22,422
  • 14
  • 75
  • 131
Zaxcie
  • 36
  • 4