0

I am new to R and statistics and am trying to do two-factor ANOVA on a dataset in csv file where values of each factor are in its own column. I was using

> mydata <- read.csv("myfile.csv")
> model = lm(result ~ factor1 * factor2, data=mydata)

As a check, I tried the ChickWeight data from R sample dataset.

> anova(with(ChickWeight, lm(weight ~ Time + Diet)))
Analysis of Variance Table

Response: weight    
           Df  Sum Sq Mean Sq  F value    Pr(>F)   
Time        1 2042344 2042344 1576.460 < 2.2e-16 *** 
Diet        3  129876   43292  > 33.417 < 2.2e-16 *** 
Residuals 573  742336    1296
> write.csv(file="ChickWeight.csv", x=ChickWeight, row.names=F)
> data = read.csv("ChickWeight.csv", header=T)
> anova(lm(weight ~ Time + Diet, data=data))
Analysis of Variance Table

Response: weight
            Df  Sum Sq Mean Sq  F value    Pr(>F)    
Time        1 2042344 2042344 1537.033 < 2.2e-16 ***
Diet        1  108177  108177   81.412 < 2.2e-16 ***
Residuals 575  764036    1329                       

Noticeably, degrees of freedom are lost for Diet column with the data read from csv into a dataframe. What am I missing here?

vagabond
  • 3,526
  • 5
  • 43
  • 76
subhacom
  • 868
  • 10
  • 24
  • Compare the structure of the two datasets. The original has factors and ordered factors. I would assume for one structure, model needs more parameters which would result in the discrepancy between degrees of freedom. – Roman Luštrik May 06 '15 at 08:21

1 Answers1

0

Got the clue from this post: Why do R and statsmodels give slightly different ANOVA results?

When the data is being read from CSV file, the Diet column is becoming an ordinary numeric column, but for ANOVA it has to be a factor variable (I am still not clear why it is a separate class/kind in R and why it cannot take care of it automatically: inexact binary representation of floats? ). So the solution was:

 > data$Diet = factor(data$Diet)
 > anova(lm("weight ~ Time + Diet", data=data))
Analysis of Variance Table

Response: weight
           Df  Sum Sq Mean Sq  F value    Pr(>F)    
Time        1 2042344 2042344 1576.460 < 2.2e-16 ***
Diet        3  129876   43292   33.417 < 2.2e-16 ***
Residuals 573  742336    1296                       
Community
  • 1
  • 1
subhacom
  • 868
  • 10
  • 24