F statistics and p-values

Question

I'm working on a project in which I have three factors and I'm measuring the length it takes for a candle to burn. This is my data:

 Size    Brand   Scent    time
   1        1      1        255
   1        1      2        225
   1        2      1        283
   1        2      2        338
   1        3      1        192
   1        3      2        229
   2        1      1        1278
   2        1      2        1496
   2        2      1        3897
   2        2      2        2781
   2        3      1        1038
   2        3      2        1439

This is what I'm doing in R for analysis bur for some reason it will not give me the F statistics and p-values.

> attach(data)
> fsize <- factor(Size)
> fbrand <- factor(Brand)
> fscent <- factor(Scent)
> model1 <- aov(time~fsize*fbrand*fscent)
> summary(model1)

                    Df Sum Sq Mean Sq
fsize                1 2507.1  2507.1
fbrand               2  829.8   414.9
fscent               1    4.4     4.4
fsize:fbrand         2  700.0   350.0
fsize:fscent         1    7.3     7.3
fbrand:fscent        2   89.5    44.8
fsize:fbrand:fscent  2  101.4    50.7

I'm sorry about the title....just freaking out because I wasn't expecting this not to work. — Jona, Apr 15 '12 at 21:33
I get F-values if I remove the term fsize:fbrand:fscent, but not with it present in the formula. — Matthew Lundberg, Apr 15 '12 at 21:40
@MatthewLundberg Indeed, perhaps the OP should read `?summary.aov` where all is explained. — joran, Apr 15 '12 at 21:46
@MatthewLundberg Do you know what might be wrong that is happening? Is it in the way the table is set up? Thanks! What are you putitng in the aov() that you are getting F values? — Jona, Apr 15 '12 at 21:46
It's a saturated model. The prediction is exact, hence no random component, hence no F statistic. — IRTFM, Apr 15 '12 at 21:48
This will give you some tips on asking a good R questions:http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Chase, Apr 15 '12 at 21:51
the problem with this question isn't really lack of reproducibility, it's failure of statistical understanding/failure to RTFM (although admittedly it would be easy to miss or fail to understand the clause "... if there are non-zero residual degrees of freedom"). If I were rewriting the function I would make it issue a warning in this case ... — Ben Bolker, Apr 15 '12 at 21:59
@DWin Is there any way to work around this so I can get F-statistic values and p-values? Thank you so much for the help! — Jona, Apr 15 '12 at 22:08
@BenBolker Is there any way to work around this so I can get F-statistic values and p-values? Thank you so much for the help! — Jona, Apr 15 '12 at 22:08
This should really go to http://stats.stackexchange.com at this point, but briefly: because you have a saturated model (as many parameters as data points), the model fits perfectly. Therefore as @DWin says the prediction is exact. Therefore the residual SS is zero, so the F-statistics (MSQ/residual MSQ) are all *infinite*, so the p values will be exactly 0, or undefined, depending on how you look at it. In short: a silly model, so the results are silly. Are you sure this is the model you wanted to fit? Just guessing, but what about `time~fsize+fbrand+fscent` ? — Ben Bolker, Apr 15 '12 at 22:13
@BenBolker Why is it a silly model? I'm trying to use the factors of size, brand and scent to see their effect on the time it takes for a candle to burn. I have two levels of size, 3 brands and 2 scnets. I'm trying to perform a 3-factor factorial design. What am I doing exactly that is silly? Thanks — Jona, Apr 15 '12 at 22:56
What you're doing that's silly is trying to test the significance of parameters for a fully parameterized (= saturated) model. If you wanted to test significance of all interactions in a full-factorial design, you would need multiple replicates in at least some (preferably all) of the factor level combinations (see @Mark Miller's answer below) — Ben Bolker, Apr 16 '12 at 00:14

Mark Miller · Answer 1 · 2012-04-15T22:39:00.023

If this were my homework, after reading all of the above comments, I might write some code like this and study it a little bit and try to think how it relates to the comments I had already received.

P.S. Then, for extra credit, I might attempt the same thing using a Bayesian approach.

my.data <- matrix(c( 
   1 ,       1,      1,        255,
   1 ,       1,      2,        225,
   1 ,       2,      1,        283,
   1 ,       2,      2,        338,
   1 ,       3,      1,        192,
   1 ,       3,      2,        229,
   2 ,       1,      1,        1278,
   2 ,       1,      2,        1496,
   2 ,       2,      1,        3897,
   2 ,       2,      2,        2781,
   2 ,       3,      1,        1038,
   2 ,       3,      2,        1439),  nrow = 12, byrow=T, 
  dimnames = list(NULL, c("Size", "Brand", "Scent",  "time")) )

my.data <- as.data.frame(my.data)

fsize  <- factor(my.data$Size)
fbrand <- factor(my.data$Brand)
fscent <- factor(my.data$Scent)

model1 <- aov(my.data$time ~ fsize * fbrand * fscent)
summary(model1)

model2 <- aov(my.data$time ~ fsize + fbrand + fscent)
summary(model2)



my.data <- matrix(c( 
   1 ,       1,      1,        255,
   1 ,       1,      2,        225,
   1 ,       2,      1,        283,
   1 ,       2,      2,        338,
   1 ,       2,      2,        300,
   1 ,       3,      1,        192,
   1 ,       3,      2,        229,
   2 ,       1,      1,        1278,
   2 ,       1,      2,        1496,
   2 ,       2,      1,        3897,
   2 ,       2,      2,        2781,
   2 ,       3,      1,        1038,
   2 ,       3,      2,        1439),  nrow = 13, byrow=T, 
  dimnames = list(NULL, c("Size", "Brand", "Scent",  "time")) )

my.data <- as.data.frame(my.data)

fsize  <- factor(my.data$Size)
fbrand <- factor(my.data$Brand)
fscent <- factor(my.data$Scent)

model3 <- aov(my.data$time ~ fsize * fbrand * fscent)
summary(model3)

model4 <- aov(my.data$time ~ fsize + fbrand + fscent)
summary(model4)

Thanks for the help! So, entering the data as a matrix instead of as an attachment makes a difference? I'm sorry but I'm not seeing the difference in the way you performed the two procedures. Sorry for the confusion, I havent had much experience in R. Thanks again! — Jona, Apr 15 '12 at 22:53
I have already done too much. I may try to help more after the deadline for your homework has passed. — Mark Miller, Apr 15 '12 at 23:02
I'm sorry I didn't imply that you need to do any more and make you uncomfortable. I just didnt understand how what you're doing is different from what I'm doing. It just seemed that you were entering the data differently and my professor has never entered it this way. It's a project I picked on my own was not expecting complications. Thanks for all you help though :) — Jona, Apr 15 '12 at 23:06
Compare the number of rows in the first vs. second example data set. — Ben Bolker, Apr 16 '12 at 00:20

F statistics and p-values

1 Answers1