1

I am trying to perform the Shapiro-wilk test using the mvnormtest package as:

mvnormtest::mshapiro.test(data)

but there is an error:

Error in mshapiro.test(data) : U[] is not a matrix with number of columns (sample size) between 3 and 5000

Although the dimension of data is 40000x10, it is not running.

A sample of data is:

structure(list(V1 = c(78.16, 99.19, 99.95, 102.44, 91.2, 0, 0, 0, 0, 0), V2 = c(9588736, 102400, 102400, 102400, 1593344, 4112384, 4112384, 4112384, 4112384, 4112384), V3 = c(149422080L, 145465344L, 138002432L, 137867264L, 103489536L, 81920L, 81920L, 81920L, 81920L, 81920L)), .Names = c("V1", "V2", "V3"), row.names = c(NA, 10L), class = "data.frame")
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
shaifali Gupta
  • 380
  • 1
  • 4
  • 16
  • Does the problem occur with the provided sample? Also, use `dput` to provide data samples (see [here](https://stackoverflow.com/questions/49994249/example-of-using-dput)) – January Jul 16 '19 at 07:25
  • Also, do you have 40,000 samples with 10 variables each, or 10 samples with 40,000 variables? The problem may be that mshapiro.test wants samples in *rows*, and variables in *columns*. – January Jul 16 '19 at 07:28
  • @January The problem occurs with full data. The data is of dimension 40000x10. This is just a very small chunk of data. In case it is required. I will not be able to share full data. It is 40000 samples with 10 variables. The test does not work, even if I transpose the data matrix. – shaifali Gupta Jul 16 '19 at 07:29
  • 1
    I get the same error with your training data which are in a `data.frame`. When I switch it to `matrix` it works. (I get another error because of small sample size but that's irrelevant). Try converting your data to matrix. – Humpelstielzchen Jul 16 '19 at 07:41

1 Answers1

0

Let us generate a data set with 6,000 columns:

df <- matrix(rnorm(6000 * 10), nrow=10)


> mvnormtest::mshapiro.test(df)
Error in mvnormtest::mshapiro.test(df) : 
  sample size must be between 3 and 5000

Clearly, the samples are in rows and the variables are in columns. So the problem was that the message in the version of the test you are using is confusing, wrongly stating "columns" instead of "rows". My version (0.1-9) simply complains that the number of samples is too large.

However, I think you will run into problems anyways: you have 40,000 variables and 10 samples, which means that mshapiro.test will run into the problem of singularity.

January
  • 16,320
  • 6
  • 52
  • 74