1

I am trying to model some data, using LDA, which is multivariate non-normal. I was hoping to get a more robust estimation, by choosing method = 'mve'. However this leads to variable predictions - minimal example supplied.

library(MASS)
library(caret)
set.seed(1)

data(iris)

acc <- list()
for (i in 1:100) {
    post_hoc <- lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
    data=iris , method = 'mve')
    conf <- table(list(predicted=predict(post_hoc)$class , observed=iris$Species ))
    acc <- append(acc, as.numeric(confusionMatrix(conf)$overall[1]))
    }
hist(as.numeric(acc))

Looking at the lda.R code I see it does not set a seed for cov.rov function. How can I get a reproducible example?

2 Answers2

0

If you set.seed before lda, results will be identical, see and wonder:

f <- \() {
  acc <- list()
  for (i in 1:100) {
    set.seed(1)
    post_hoc <- lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
                    data=iris , method = 'mve')
    conf <- table(list(predicted=predict(post_hoc)$class , observed=iris$Species ))
    acc <- append(acc, as.numeric(confusionMatrix(conf)$overall[1]))
  }
  acc
}

library(MASS); library(caret)

acc1 <- f()
all(sapply(acc1, all.equal, acc1[[1]]))
# [1] TRUE
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • The issue is not that two runs are identical, there should not be variation in a single run. If you use a different 'method', e.g. moment/t/mle you get reproducible results. Isn't setting the seed suppose to instruct the 'cov.rob' method to sample the same subset of data points and produce the same results? – Israel Zadok Jun 02 '22 at 18:27
  • @IsraelZadok Well, then set it before `lda` which is stochastic, see update. – jay.sf Jun 02 '22 at 18:53
0

O.K., I've edited a version of lda.R with a set.seed() and the results are reproducible. This is strange.