4

I am pairing up online guides with an old text to learn R (page 182 - http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf). When I use data from a package from R (as in the tutorial examples) there is no problem. However, when I use data from my text, I always end with no F-value and the warning.

Take a look:

data into a data.frame:

car.noise <- data.frame( speed = c("idle", "0-60mph", "over 60"), chrysler = c(41,65,76), 
bmw = c(45,67,72), ford = c(44,66,76), chevy = c(45,66,77), subaru = c(46,76,64))

check the data.frame:

car.noise
    speed chrysler bmw ford chevy subaru
1    idle       41  45   44    45     46
2 0-60mph       65  67   66    66     76
3 over 60       76  72   76    77     64

melt data.frame:

mcar.noise<- melt(car.noise, id.var="speed")

check melted data.frame

> mcar.noise
     speed variable value
1     idle chrysler    41
2  0-60mph chrysler    65
3  over 60 chrysler    76
4     idle      bmw    45
5  0-60mph      bmw    67
6  over 60      bmw    72
7     idle     ford    44
8  0-60mph     ford    66
9  over 60     ford    76
10    idle    chevy    45
11 0-60mph    chevy    66
12 over 60    chevy    77
13    idle   subaru    46
14 0-60mph   subaru    76
15 over 60   subaru    64

perform anova and get warning:

> anova(lm(value ~ variable * speed, mcar.noise))
Analysis of Variance Table

Response: value 
               Df  Sum Sq Mean Sq F value Pr(>F)
variable        4    6.93    1.73               
speed           2 2368.13 1184.07               
variable:speed  8  205.87   25.73               
Residuals       0    0.00                       
Warning message:
In anova.lm(lm(value ~ variable * speed, mcar.noise)) :
  ANOVA F-tests on an essentially perfect fit are unreliable

The only 2 explanations I can come up with:

1: I am coding incorrectly 2: Text examples are too 'perfect' of a fit since they are trying to show clear example

tora0515
  • 2,479
  • 12
  • 33
  • 40

1 Answers1

9

You are trying to fit a model that gives a separate mean to every combination of variable*speed. With the data you have that means you don't have any replication at all. It would be like trying to compare two groups when you only have a single value from each group.

If you look at the line for "Residuals" in your anova table you should notice that you don't have any degrees of freedom there and your sums of squares are 0 as well. You could try to fit a model without an interaction if you feel it is appropriate but you don't have enough data to fit a model with an interaction.

Dason
  • 60,663
  • 9
  • 131
  • 148
  • Ah, so my mistake is using code for anova with replication. Any direction were I should look for two-way anova without replication. – tora0515 Dec 18 '11 at 06:56
  • 2
    Like I said you could fit a model without interaction if you think the effects of variable and speed are additive. But I'm not sure if that would be a good assumption in this case. – Dason Dec 18 '11 at 07:10
  • I am just following an example from a text, so not too worried about assumptions now, just need to match what's in the book and try and understand what each piece means, all the fun assuming comes in the next chapter...yeah! anyway, so instead of speed * variable, use speed + variable? – tora0515 Dec 18 '11 at 07:15
  • Got it! thanks. Should probably try out what I ask, before I ask it. thanks again. – tora0515 Dec 18 '11 at 07:18
  • My guess is that the text just wanted you to fit a 2 way anova without interaction. You're correct in that you would use speed+variable instead of speed*variable. – Dason Dec 18 '11 at 07:19
  • Then in the next section of this chapter I will go over two-way anova with replication. When I try it in R, I should use speed*variable (or what ever the example variables are...). Look at all the learning~! Haha. Thanks again for the help. – tora0515 Dec 18 '11 at 07:31
  • You should look at my answer that I built for you other similar question. I addressed how to move to a non-factor coding so that you avoid this problem of saturating you model in parameters. – IRTFM Dec 18 '11 at 14:09
  • @Dason Hi there. Is the reason we need more data because when you inlcude the interactions, the total degrees of freedom due to treatments would be as large or larger than the number of observations. Therefore `DF_error = DF_total - DF_treatments` would be a nonsense result. Is this the right idea? Thanks – JasTonAChair May 24 '22 at 13:08
  • @JasTonAChair I think even trying to get into the degrees of freedom muddles things up. The argument is that the model fit a separate mean for each group but each group only has one observation. Bringing it down to something similar but easier to think about (which I did in my post) it's like trying to compare two groups when you only have a single measurement from each. You have no way to estimate the variation at all. – Dason May 24 '22 at 17:53