How can I get the same output as the lm() function after using the train() function of caret?

Question

I'm actually trying to do some test on my linear regression model with different functions as ols_vif_tol(), ols_test_normality() or durbinWatsonTest() which only work with lm(). However, I got my model using the train() function of the caret package.

> fitcontrol = trainControl( method = "repeatedcv", number = floor(0.4*nrow(TrainData)), repeats = RepeatsTC, returnResamp = "all", savePredictions = "all")
> BestModel = train(Formula2, data = TrainData, trControl = fitcontrol, method = "lm", metric = "RMSE")

At the end I get this output:

> BestModel
Linear Regression 

10 samples
 1 predictor

No pre-processing
Resampling: Cross-Validated (4 fold, repeated 100 times) 
Summary of sample sizes: 7, 8, 8, 7, 7, 8, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  10.75823  0.8911761  9.660638

Tuning parameter 'intercept' was held constant at a value of TRUE

What I want is to have this output:

> GoodModel = lm(Formula2, data = FinalData)
> GoodModel

Call:
lm(formula = Formula2, data = FinalData)

Coefficients:
    (Intercept)  Evol.INDUS.PROD  
          4.089            3.908

So, even if I used method = "lm" I don't have the same output which to give me an error when I do my tests.

> ols_test_normality(BestModel)
Error in ols_test_normality.default(BestModel) : y must be numeric
> ols_test_normality(GoodModel)
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.9042         0.1528 
Kolmogorov-Smirnov        0.1904         0.6661 
Cramer-von Mises          1.1026         0.0010 
Anderson-Darling          0.4615         0.2156 
-----------------------------------------------

I know there is a as.lm function but I tried it and I don't have a version that can use it.

Does someone know how to get the same form as the lm() function after using train or a way to use the output of BestModel to do those tests?

EDIT

Here is a simpler case that gives rise to the same error and where you can try different tests.

install.packages("olsrr")
install.package("caret")
library(olsrr)
library(caret)

first = sample(1:10, 10, rep = TRUE)
second = sample(10:20, 10, rep = TRUE)
third = sample(20:30, 10, rep = TRUE)
Df = data.frame(first, second, third)
Df

#Create a model with lm

Model1 = lm(first ~ second + third, data = Df)
Model1
summary(Model1)
ols_test_normality(Model1)

#Create a model with caret::train

Fold = sample(1:nrow(Df) ,size = 0.8*nrow(Df), replace = FALSE)
TrainData = Df[Fold,]
TestData = Df[-Fold,]
fitcontrol = trainControl(method = "repeatedcv", number = 2, repeats = 10)
Model2 = train(first ~ second + third, data = TrainData, trControl = fitcontrol, method = "lm")
Model2
summary(Model2)
ols_test_normality(Model2)

Thank you

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Dec 01 '21 at 22:31

score 0 · Answer 1 · answered Dec 03 '21 at 08:50

Your Model2 is a train object, so ols_test_normality will not work on it:

class(Model2)
[1] "train"         "train.formula"

The final lm model is stored under finalModel:

class(Model2$finalModel)
[1] "lm"

ols_test_normality(Model2$finalModel)
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.9843         0.9809 
Kolmogorov-Smirnov        0.149          0.9822 
Cramer-von Mises          0.4212         0.0611 
Anderson-Darling          0.1677         0.9004 
-----------------------------------------------

This is exactly what I wanted, thank you! – Tim007 Dec 03 '21 at 15:58 — Tim007, Dec 03 '21 at 15:58

How can I get the same output as the lm() function after using the train() function of caret?

1 Answers1