Edit:
My previous edit re. read.file
treating the first row as a header is correct, but this is not the case. Apparently columns 1 to 6, regardless whether called V1, V2, V3, V4, V5, V6
or X1, X3, X5, X7, X9, X11
, do give different results. I will investigate further slightly later.
library(mclust)
library(psych)
library(magrittr)
# sessionInfo()
# R version 3.4.0 (2017-04-21)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=English_United Kingdom.1252
# [2] LC_CTYPE=English_United Kingdom.1252
# [3] LC_MONETARY=English_United Kingdom.1252
# [4] LC_NUMERIC=C
# [5] LC_TIME=English_United Kingdom.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods
# [7] base
#
# other attached packages:
# [1] magrittr_1.5 psych_1.7.5 mclust_5.3
#
# loaded via a namespace (and not attached):
# [1] compiler_3.4.0 parallel_3.4.0 tools_3.4.0
# [4] foreign_0.8-68 rstudioapi_0.6 mdaddins_0.0.0001
# [7] nlme_3.1-131 mnormt_1.5-5 grid_3.4.0
# [10] lattice_0.20-35
testData_rt <- read.table("http://fimi.ua.ac.be/data/chess.dat")
testData_rf <- read.file("http://fimi.ua.ac.be/data/chess.dat", header = FALSE) # Without this read.file is skipping first row
testData_rf_head <- read.file("http://fimi.ua.ac.be/data/chess.dat")
testData_rf_head %<>%set_names(names(testData_rf))
testData_rf_head_V2 <- read.file("http://fimi.ua.ac.be/data/chess.dat")
testData_rt %>% str()
testData_rf %>% str()
testData_rf_head %>% str()
# Same res.:
summary(Mclust(subset(testData_rt, select = c(V1, V3, V5, V7, V9, V11))))
summary(Mclust(subset(testData_rt, select = c(V11, V9, V1, V3, V5, V7))))
# Same res.:
summary(Mclust(subset(testData_rf, select = c(V1, V3, V5, V7, V9, V11))))
summary(Mclust(subset(testData_rf, select = c(V11, V9, V1, V3, V5, V7))))
# Same res.:
summary(Mclust(subset(testData_rf_head, select = c(V1, V3, V5, V7, V9, V11))))
summary(Mclust(subset(testData_rf_head, select = c(V11, V9, V1, V3, V5, V7))))
# Different res.:
summary(Mclust(subset(testData_rf_head_V2, select = c(X1, X3, X5, X7, X9, X11))))
summary(Mclust(subset(testData_rf_head_V2, select = c(X11, X9, X1, X3, X5, X7))))
# Different res.:
summary(Mclust(subset(testData_rf_head, select = c(V1, V2, V3, V4, V5, V6))))
summary(Mclust(subset(testData_rf_head, select = c(V6, V5, V1, V2, V3, V4))))
Old answer:
Have done my best to investigate the issue:
- Current R (3.4.0) and mclust (5.3) tested: order and seed had no effect;
- mclust 4.2 (current on Dec 5 '13 when the question was asked), the same, no effect;
- R 2.25.3 mentioned by @user3068797: could not compile mclust 4.2, gave up as it would take too long to debug this;
- @Cody did not provide a sessionInfo(), so do not know where to dig more.
To the code:
library(mclust)
sessionInfo()
# R version 3.4.0 (2017-04-21)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)
#
# other attached packages:
# [1] mclust_5.3
testData <- read.table("http://fimi.ua.ac.be/data/chess.dat")
## Seed and order have no effect:
# set.seed(1)
set.seed(2)
summary(Mclust(subset(testData, select = c(V1, V3, V5, V7, V9, V11))))
# ----------------------------------------------------
# Gaussian finite mixture model fitted by EM algorithm
# ----------------------------------------------------
#
# Mclust EII (spherical, equal volume) model with 9 components:
#
# log.likelihood n df BIC ICL
# -3597.466 3196 63 -7703.32 -7735.137
#
# Clustering table:
# 1 2 3 4 5 6 7 8 9
# 774 150 752 486 227 224 238 178 167
set.seed(1)
# set.seed(2)
summary(Mclust(subset(testData, select = c(V11, V9, V1, V3, V5, V7))))
# ----------------------------------------------------
# Gaussian finite mixture model fitted by EM algorithm
# ----------------------------------------------------
#
# Mclust EII (spherical, equal volume) model with 9 components:
#
# log.likelihood n df BIC ICL
# -3597.466 3196 63 -7703.32 -7735.137
#
# Clustering table:
# 1 2 3 4 5 6 7 8 9
# 774 150 752 486 227 224 238 178 167
## Question asked asked Dec 5 '13
## mclust 4.2 modified on 2013-07-19, 4.3 introduced on 2014-03-31
devtools::install_version(package = 'mclust', version = 4.2)
## Fix mclust:::unchol
# mclust:::unchol
unchol <- function(x, upper = NULL)
{
if(is.null(upper)) {
upper <- any(x[row(x) < col(x)])
lower <- any(x[row(x) > col(x)])
if(upper && lower)
stop("not a triangular matrix")
if(!(upper || lower)) {
x <- diag(x)
return(diag(x * x))
}
}
dimx <- dim(x)
storage.mode(x) <- "double"
.Fortran("uncholf",
as.logical(upper),
x,
as.integer(nrow(x)),
as.integer(ncol(x)),
integer(1),
PACKAGE = "mclust")[[2]]
}
assignInNamespace("unchol", unchol, ns = "mclust")
# fixInNamespace(unchol, pos = "package:mclust")
mclust:::unchol
## Again, seed and order have no effect:
# set.seed(1)
set.seed(2)
summary(Mclust(subset(testData, select = c(V1, V3, V5, V7, V9, V11))))
# ----------------------------------------------------
# Gaussian finite mixture model fitted by EM algorithm
# ----------------------------------------------------
#
# Mclust EII (spherical, equal volume) model with 9 components:
#
# log.likelihood n df BIC ICL
# -3597.466 3196 63 -7703.32 -7735.137
#
# Clustering table:
# 1 2 3 4 5 6 7 8 9
# 774 150 752 486 227 224 238 178 167
#
# Warning messages:
# 1: In summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) :
# best model occurs at the min or max # of components considered
# 2: In Mclust(subset(testData, select = c(V1, V3, V5, V7, V9, V11))) :
# optimal number of clusters occurs at max choice
set.seed(1)
# set.seed(2)
summary(Mclust(subset(testData, select = c(V11, V9, V1, V3, V5, V7))))
# ----------------------------------------------------
# Gaussian finite mixture model fitted by EM algorithm
# ----------------------------------------------------
#
# Mclust EII (spherical, equal volume) model with 9 components:
#
# log.likelihood n df BIC ICL
# -3597.466 3196 63 -7703.32 -7735.137
#
# Clustering table:
# 1 2 3 4 5 6 7 8 9
# 774 150 752 486 227 224 238 178 167
#
# Warning messages:
# 1: In summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) :
# best model occurs at the min or max # of components considered
# 2: In Mclust(subset(testData, select = c(V11, V9, V1, V3, V5, V7))) :
# optimal number of clusters occurs at max choice
## Check R 2.15.3 from https://cran.r-project.org/bin/windows/base/old/2.15.3/
## Trued with fixing con <- gzcon(url("http://cran.rstudio.com/src/contrib/Meta/archive.rds", 'rb')), but compile...
devtools::install_version(package = 'mclust', version = 4.2)
Edit:
Fortran functions unchol (mclust 4.2) and uncholf (mclust 5.3) do not differ:
uncholf 5.3, unchol 4.3
Mclust does differ, but provide same results, so I guess changes were simply fixing errors etc.: Mclust 5.3 , Mclust 4.3