0

I'm struggling to do some analysis using R: up until now I've done some clustering and decisional trees.

I would like to use only ONE variable to build up the tree but it does not seem possible with mclust::Mclust(). Theoretically it shouldn't be a problem.

Here is a reproducible example using the altitude builtin dataset :

library(mclust)
#> Package 'mclust' version 5.4.8
#> Type 'citation("mclust")' for citing this R package in publications.
# Using 2 variables it works as expected
ModelloT1 <- Mclust(attitude[1:2],modelNames = c("EII", "VII"))
ModelloT1$BIC
#> Bayesian Information Criterion (BIC): 
#>         EII       VII
#> 1 -483.9666 -483.9666
#> 2 -472.9461 -471.5116
#> 3 -462.3355 -467.6628
#> 4 -472.5525 -478.1093
#> 5 -481.2430 -485.7124
#> 6 -478.3516 -489.8570
#> 7 -485.2181        NA
#> 8 -488.2741        NA
#> 9 -492.2669        NA
#> 
#> Top 3 models based on the BIC criterion: 
#>     EII,3     VII,3     VII,2 
#> -462.3355 -467.6628 -471.5116

# But I can't use a single variable
ModelloT1 <- Mclust(attitude[2],modelNames = c("EII", "VII"))
#> Error in `[<-`(`*tmp*`, "1", mdl, value = bic(modelName = mdl, loglik = out$loglik, : subscript out of bounds

Created on 2021-11-22 by the reprex package (v2.0.1)

After that, I usually do an information gain and then the decision tree with J48 function.

Can I use mclust::Mclust() or a similar tool to build a tree with a single variable ?

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
Filippo
  • 33
  • 1
  • 7
  • You will improve your chances of getting a quality answer by providing the code that reproduces this error, using datasets available to everyone. – moodymudskipper Nov 22 '21 at 10:54
  • Hi! Rather than the actual code, I would be insterested to know if is possible to use Mclust with one variable or if It should be used another package or method... – Filippo Nov 22 '21 at 11:12
  • I understand, but I promise you you'll get a good answer if you provide reproducible code, as it is now you probably won't (I hope you will!). Your current code isn't even syntactically valid (missing bracket) and you don't mention which package you're using. This tends to drive away users that might know how to help you. This might help: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – moodymudskipper Nov 22 '21 at 11:25
  • I tried to do as you said... is it ok for you? – Filippo Nov 22 '21 at 12:00
  • I arranged it a bit for you. This should be answerable now. FWIW I never used these tools, but `Mclust(attitude[c(2,2)],modelNames = c("EII", "VII"))` (repeating the unique variable) seems to work. (if you want to add to your question some things I've removed, you have access to the edit history) – moodymudskipper Nov 22 '21 at 12:27

1 Answers1

1

If you have 1 column, your data is univariate not multivariate. You cannot use EII or VII as these are meant for multivariate.

Do ?mclustModelNames to see a list of all the models. If you do that, you'll see :

‘"E"’ equal variance (one-dimensional)
‘"V"’ variable/unqual variance (one-dimensional)

So if you do the below, it will work:

df = data.frame(x = runif(100),y=runif(100))
Mclust(df,modelNames = c("EII", "VII"))
Mclust(df[['x']],modelNames = c("E","V"))
StupidWolf
  • 45,075
  • 17
  • 40
  • 72