1

I'm actually trying to do some test on my linear regression model with different functions as ols_vif_tol(), ols_test_normality() or durbinWatsonTest() which only work with lm(). However, I got my model using the train() function of the caret package.

> fitcontrol = trainControl( method = "repeatedcv", number = floor(0.4*nrow(TrainData)), repeats = RepeatsTC, returnResamp = "all", savePredictions = "all")
> BestModel = train(Formula2, data = TrainData, trControl = fitcontrol, method = "lm", metric = "RMSE")

At the end I get this output:

> BestModel
Linear Regression 

10 samples
 1 predictor

No pre-processing
Resampling: Cross-Validated (4 fold, repeated 100 times) 
Summary of sample sizes: 7, 8, 8, 7, 7, 8, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  10.75823  0.8911761  9.660638

Tuning parameter 'intercept' was held constant at a value of TRUE

What I want is to have this output:

> GoodModel = lm(Formula2, data = FinalData)
> GoodModel

Call:
lm(formula = Formula2, data = FinalData)

Coefficients:
    (Intercept)  Evol.INDUS.PROD  
          4.089            3.908

So, even if I used method = "lm" I don't have the same output which to give me an error when I do my tests.

> ols_test_normality(BestModel)
Error in ols_test_normality.default(BestModel) : y must be numeric
> ols_test_normality(GoodModel)
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.9042         0.1528 
Kolmogorov-Smirnov        0.1904         0.6661 
Cramer-von Mises          1.1026         0.0010 
Anderson-Darling          0.4615         0.2156 
-----------------------------------------------

I know there is a as.lm function but I tried it and I don't have a version that can use it.

Does someone know how to get the same form as the lm() function after using train or a way to use the output of BestModel to do those tests?

EDIT

Here is a simpler case that gives rise to the same error and where you can try different tests.

install.packages("olsrr")
install.package("caret")
library(olsrr)
library(caret)

first = sample(1:10, 10, rep = TRUE)
second = sample(10:20, 10, rep = TRUE)
third = sample(20:30, 10, rep = TRUE)
Df = data.frame(first, second, third)
Df

#Create a model with lm

Model1 = lm(first ~ second + third, data = Df)
Model1
summary(Model1)
ols_test_normality(Model1)

#Create a model with caret::train

Fold = sample(1:nrow(Df) ,size = 0.8*nrow(Df), replace = FALSE)
TrainData = Df[Fold,]
TestData = Df[-Fold,]
fitcontrol = trainControl(method = "repeatedcv", number = 2, repeats = 10)
Model2 = train(first ~ second + third, data = TrainData, trControl = fitcontrol, method = "lm")
Model2
summary(Model2)
ols_test_normality(Model2)

Thank you

Tim007
  • 11
  • 2
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Dec 01 '21 at 22:31

1 Answers1

0

Your Model2 is a train object, so ols_test_normality will not work on it:

class(Model2)
[1] "train"         "train.formula"

The final lm model is stored under finalModel:

class(Model2$finalModel)
[1] "lm"

ols_test_normality(Model2$finalModel)
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.9843         0.9809 
Kolmogorov-Smirnov        0.149          0.9822 
Cramer-von Mises          0.4212         0.0611 
Anderson-Darling          0.1677         0.9004 
-----------------------------------------------
StupidWolf
  • 45,075
  • 17
  • 40
  • 72