0

I want to test homoscedasticity using the levene.test function from the lawstat package specifically because I like the bootstrap option and the fact that it returns a list rather then the unmanageable output of car::leveneTest. From the help of lawstat::levene.test it is clear that the NAs should be omitted by default from my dataset. Below I provide the original data.

testset.logcount<-c(6.86923171973098, 6.83122969386706, 7.30102999566398,7.54282542695918,6.40823996531185, 6.52891670027766, 6.61278385671974, 6.71933128698373, 6.96567197122011, 6.34242268082221, 6.60205999132796, 6.69897000433602, 6.6232492903979, 6.54157924394658, 6.43136376415899, 6.91381385238372,6.44715803134222, 6.30102999566398, 6.10037054511756, 6.7481880270062,NA, 4.89762709129044,5.26951294421792, 5.12385164096709, 5.11394335230684, 4.43136376415899, 5.73957234445009, 5.83250891270624, 5.3451776165427, 5.77887447200274, 5.38524868240322, 5.75127910398334, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
testset.treat<-structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CTL","TRM"), class = "factor")

when I execute lawstat::levene.test(y=testset.logcount,group=testset.treat) I get the following error message: Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

According to me the testset.treat clearly has two levels.

Also when using leveneTest(y=testset.logcount,group=testset.treat) or fligner.test(x=testset.logcount,g=testset.treat) both run without any errors.

I could not find out why I got this particular error with lawstat::levene.test and I am hoping that somebody here can help me out.

I am running R 3.0.0 on a x86_64-w64-mingw32/x64 platform (Windows 7, 64 bit).

FM Kerckhof
  • 1,270
  • 1
  • 14
  • 31
  • "According to me the testset.treat clearly has two levels." You're apparently mistaken. It may have two levels, but both of those levels need to actually occur in the data. You have a column somewhere with only a single unique value. – joran Aug 12 '13 at 15:04
  • 1
    Stepping through the function, it appears that it might be a bug, that while attempting to subset away the NAs, it creates the problem of a factor with only one level. Early on it does: `y <- y[!is.na(y)];group <- group[!is.na(y)]`. That seems less than ideal. – joran Aug 12 '13 at 15:12
  • @joran I think if you execute the example that I provided here you will see that this is clearly not the case. The data is unbalanced (20 CTL, 11 TRM) but there is no column with only a single unique value unless this is not what you meant. – FM Kerckhof Aug 12 '13 at 15:16
  • See my second comment. My first comment was simply explaining the error (which was correct, from R's perspective; you simply had to find where the bad factor was being created). – joran Aug 12 '13 at 15:17
  • BTW, this is a rather bad, but easily fixable bug. It should be reported to the package maintainer, ASAP. – joran Aug 12 '13 at 15:18
  • I will do this right now. Thank you for clearing this out. – FM Kerckhof Aug 12 '13 at 15:19
  • By the way, sorry if my first comment seemed brusque. My point was just that you should always trust the error message. It said you had a factor with just one level, so you have to trust R that it actually had that problem, and then dig around to find out why, that's all. – joran Aug 12 '13 at 15:26
  • Don't worry about that, I should really learn how to dig in the code a bit more, and in general more stuff about debugging in R. I was just assuming that it was my data rather then the code which claimed to be able to handle NAs. – FM Kerckhof Aug 12 '13 at 15:35

3 Answers3

2

For the record, this behavior was created by a bug in the function's attempt to remove NA values. It was attempting to do this using the code:

y <- y[!is.na(y)]
group <- group[!is.na(y)]

which, if there are NA values in y could be very bad. In this particular case it wiped out the second factor level.

It should be an easy fix, once reported.

joran
  • 169,992
  • 32
  • 429
  • 468
0

After some playing around with levene.test(), I think the problem is the missing values.

test <- cbind(testset.logcount, testset.treat)
test <- test[complete.cases(test),] #removing Nas
levene.test(test[,1], test[,2])


        modified robust Brown-Forsythe Levene-type test based on the absolute
        deviations from the median

data:  test[, 1]
Test Statistic = 0.9072, p-value = 0.3487

And this does match car's levene test (df = 29), so it must automatically delete missing rows

> leveneTest(y=testset.logcount,group=testset.treat)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.9072 0.3487
      29     
so13eit
  • 942
  • 3
  • 11
  • 22
  • 1
    It's worse than this. See my comment above. – joran Aug 12 '13 at 15:18
  • The problem is indeed the missing values. That I was aware of. But the function claims to omit NAs in its helpfile (literally *By default, NAs from the data are omitted*) which causes problems as pointed out by @joran – FM Kerckhof Aug 12 '13 at 15:22
0

I also went through a similar process. The aim is not to let the second factor be wiped out where 'y' is your numeric vector and 'g' the factor of the data.

yg = na.omit(data.frame(y,g))
y = yg[,1]
g = yg[,2]
Caxton
  • 1,050
  • 1
  • 8
  • 10