0

While working on a univariate fit using Mclust I am getting following error:

Error in mstepE(data = as.matrix(data)[initialization$subset, ], z = z,  : 
  row dimension of z should equal data length

I am using the code mentioned in: https://cran.r-project.org/web/packages/mclust/vignettes/mclust.html#initialisation

This is the code section where I am getting error:

df1 <- dataSample
BIC <- NULL
for(j in 1:20){
  rBIC <- mclustBIC(df1, verbose = T,
                    initialization = list(hcPairs = randomPairs(df1)))
  BIC <- mclustBICupdate(BIC, rBIC)
}
summary(BIC)

Following link contains data to be passed to variable 'df1' (file name:dataSample.csv) https://drive.google.com/open?id=0Bzau9RsRnQreYk9XOWVBSm91b2o4NTQ4RlA2UFdWbDBVOVpR

na ja
  • 45
  • 1
  • 2
  • 10
  • In case you don't much response, many SO users are not keen on providing data as a link, for two reasons that I'm aware: (1) links can go stale, leaving this question [*unreproducible*](https://stackoverflow.com/questions/5963269); and (2) though I've not heard of it happening, some avoid clicking links for fear of downloading malware ... though I'm less concerned about *that*, I've seen questions with huge datasets, not something I want to do on a whim. If you get low/no response, consider providing a smaller sample of this data (that shows the same error). – r2evans Jan 30 '19 at 16:32
  • Thanks @r2evans! As you have assumed, my data set is a huge vector having 8254 elements. I don't get this error if up to initial 2000 elements are given as numeric input to 'df1'. But from 2001th element, I have started receiving this error.( i.e.if I give input having more than 2000 elements). Is there any better way to supply the same data? Does mclustBIC has limitation over input vector element/length? – na ja Jan 31 '19 at 05:31
  • Is it just the quantity of data that's doing it, or is there something (e.g., `NA` or `NaN`) around element 2001? One way to give us "this much data" is to determine if you can repeat the error with programmatically-defined numbers, either random (please include `set.seed`) or `seq`uential or similar. I'm not an `mclust` guru so I can't help with the math, but if it's tangential to the math then I might be able to find something. – r2evans Jan 31 '19 at 14:50

1 Answers1

2

This is the solution I get from one of the Authors (Prof. Luca Scrucca) for 'mclust' library:

"there was a bug due to the use of automatic subset that clash when hcPairs are provided. I have fixed it in the current dev version of mclust. Since submission to CRAN won't happen shortly, you may use the following code to avoid the error with the current release of mclust:

rBIC <- mclustBIC(df1, verbose = T,
                  initialization = list(hcPairs = randomPairs(df1),
                                        subset = 1:NROW(df1)))

When the bug fix will be released, the subset argument could be omitted as it is redundant."

Now, the code is working fine.

na ja
  • 45
  • 1
  • 2
  • 10