Questions tagged [anova]

ANOVA is an acronym for "analysis of variance". It is a widely used statistical technique to analyze the source of variance within a data set.

Overview

Although ANOVA stands for ANalysis Of VAriance, it is about comparing means of data from different groups. It is part of the general linear model which also includes linear regression and ANCOVA. In matrix algebra form, all three are:

Y=XB+e

Where Y is a vector of values for the dependent variable (these must be numeric), X is a matrix of values for the independent variables and e is error.

Tag usage

  • SO questions on ANOVA should be about implementation and programming problems, not about the statistical or theoretical properties of the technique.

  • Consider whether your question might be better suited to CrossValidated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software for statistical computing and graphics, function aov implements ANOVA. Note that function anova does something else. See When should I use aov() and when anova()?

1456 questions
78
votes
7 answers

Extract p-value from aov

I am looking to extract the p-value generated from an anova in R. Here is what I am running: test <- aov(asq[,9] ~ asq[,187]) summary(test) Yields: Df Sum Sq Mean Sq F value Pr(>F) asq[, 187] 1 3.02 3.01951 12.333…
Btibert3
  • 38,798
  • 44
  • 129
  • 168
38
votes
1 answer

When should I use aov() and when anova()?

I have referred to much of online literature but it is increasing my confusion. Much of the discussion is too technical with terms unbalanced designs and I, II or III factor ANOVA and everything. I only know that aov() uses lm() internally and is…
Chadwick Robbert
  • 1,026
  • 1
  • 11
  • 23
38
votes
1 answer

ANOVA in python using pandas dataframe with statsmodels or scipy?

I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation…
wolfsatthedoor
  • 7,163
  • 18
  • 46
  • 90
17
votes
1 answer

scikit learn: how to check coefficients significance

i tried to do a LR with SKLearn for a rather large dataset with ~600 dummy and only few interval variables (and 300 K lines in my dataset) and the resulting confusion matrix looks suspicious. I wanted to check the significance of the returned…
dadam
  • 181
  • 1
  • 1
  • 6
16
votes
2 answers

aov() error term in R: what's the difference bw Error(id) and Error(id/timevar) specification?

What is the difference between the aov(depvar~timevar+Error(id)) and the aov(depvar~timevar+Error(id/timevar)) formula specifications? These two variants produce slightly different results. The same question was once asked here:…
NeverTim
  • 161
  • 1
  • 1
  • 6
16
votes
6 answers

R error which says "Models were not all fitted to the same size of dataset"

I have created two generalised linear models as follows: glm1 <-glm(Y ~ X1 + X2 + X3, family=binomial(link=logit)) glm2 <-glm(Y ~ X1 + X2, family=binomial(link=logit)) I then use the anova function: anova(glm2,glm1) but get an error…
REnthusiast
  • 1,591
  • 3
  • 16
  • 18
15
votes
4 answers

How to do a Tukey HSD test with the Anova command (car package)

I'm dealing with an unbalanced design/sample and originally learned aov(). I know now that for my ANOVA tests I need to use the Type III Sum of Squares which involves using fitting using lm() rather than using aov(). The problem is getting post-hoc…
leighadlr
  • 159
  • 1
  • 1
  • 5
15
votes
2 answers

Repeated-measures / within-subjects ANOVA in R

I'm attempting to run a repeated-meaures ANOVA using R. I've gone through various examples on various websites, but they never seem to talk about the error that I'm encountering. I assume I'm misunderstanding something important. The ANOVA I'm…
vize
  • 281
  • 1
  • 2
  • 10
15
votes
3 answers

Homoscedascity test for Two-Way ANOVA

I've been using var.test and bartlett.test to check basic ANOVA assumptions, among others, homoscedascity (homogeniety, equality of variances). Procedure is quite simple for One-Way ANOVA: bartlett.test(x ~ g) # where x is numeric, and g is a…
aL3xa
  • 35,415
  • 18
  • 79
  • 112
15
votes
4 answers

Efficient algorithm for detecting different elements in a collection

Imagine you have a set of five elements (A-E) with some numeric values of a measured property (several observations for each element, for example "heart rate"): A = {100, 110, 120, 130} B = {110, 100, 110, 120, 90} C = { 90, 110, 120, 100} D = {120,…
Guido
  • 46,642
  • 28
  • 120
  • 174
13
votes
2 answers

invalid type (list) for variable

I am trying to run an anova model in R. I have a data file which contains 3 rows and 12 columns. Each row is data for a particular level of the explanatory variable. Cell [i,j] is the j'th response for level i. The file is ".dat" extension. I am…
nbk
  • 523
  • 1
  • 6
  • 20
13
votes
1 answer

Planned contrasts using ezANOVA output in R

I've been looking into using planned contrasts as opposed to post-hoc t-tests. I typically use ezANOVA (Type III ANOVA) but it seems that conducting planned contrasts using ezANOVA is not currently catered for. aov() on the other hand is a Type I…
Docconcoct
  • 2,040
  • 4
  • 28
  • 52
13
votes
1 answer

Do I need to set refit=FALSE when testing for random effects in lmer() models with anova()?

I am currently testing whether I should include certain random effects in my lmer model or not. I use the anova function for that. My procedure so far is to fit the model with a function call to lmer() with REML=TRUE (the default option). Then I…
lord.garbage
  • 5,884
  • 5
  • 36
  • 55
12
votes
1 answer

How to do one-way ANOVA in R with unequal sample sizes?

Trying to learn R. A question from an old stats text want's to know if there is a difference in break times at different construction sites. Trouble is, the text decided that each site employs a different number of workers. So, I am stuck and…
tora0515
  • 2,479
  • 12
  • 33
  • 40
12
votes
3 answers

Custom contrasts in R: contrast coefficient matrix or contrast matrix / coding scheme? And how to get there?

Custom contrasts are very widely used in analyses, e.g.: "Do DV values at level 1 and level 3 of this three-level factor differ significantly?" Intuitively, this contrast is expressed in terms of cell means as: c(1,0,-1) One or more of these…
tim
  • 3,559
  • 1
  • 33
  • 46
1
2 3
96 97