0

I'm new statistics and I'm trying to do a step-wise multiple regression with categorical predictor using the train() in the caret package. But I don't think I'm doing it correctly. Here is my code:

#Stepwise multiple regression
set.seed(123)
# Set up repeated k-fold cross-validation
train.control <- trainControl(method = "cv", number = 10)
# Train the model
step.model <- train(Rebreeding_Score ~., data = dfp1,
                    method = "leapBackward", 
                    tuneGrid = data.frame(nvmax = 1:5),
                    trControl = train.control
                    )
step.model$results
step.model$bestTune
summary(step.model$finalModel)
coef(step.model$finalModel, 5)

The function seems to select specific categories within the predictor rather than the predictor as a whole. I hope I'm explaining this correctly...

Output: Result 1 Result 2

Ideally the multiple regression model should look like this.

Rfinal <- lm(Rebreeding_Score ~ Cohort + mating_group, data = dfp1, na.action = na.omit)
summary(Rfinal)

Any help would be greatly appreciated. Thank you.

twlandre
  • 3
  • 1
  • 2
    Hi twlandre, it will be easier to help if you provide a sample of `dfp1` so we can execute your code (for example with `dput()`). See [https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Ian Campbell Mar 25 '20 at 22:15
  • 2
    I think from what you are showing that it is working right, but I cannot tell for sure because I cannot see any of the data. But in general regressions for numercal data create ONE coefficient, and it changes y per unit of change in that x variable. For categorical variables, each category gets a coefficient, and if an observation has that category, the y variable changes the amount of the coefficient, for the other categories it gets a zero and does not change at all...so only one is applied per observation, but each category exerts a different amount of influence. So, that is probably right. – sconfluentus Mar 25 '20 at 22:44

0 Answers0