-3

As a part of a project I need to perform anova analysis between the various columns of a csv file. Is there any way I can write a loop to do the anova between the all the columns instead of doing it individually?
Right now I am using the following code.

anova(colx,col1)
anova(colx,col2)
.
.
.
anova(colx,coln)

I want to automate this process and select the columns which give the maximum F value.

Pokechu22
  • 4,984
  • 9
  • 37
  • 62
  • one approach would combine `combn()`, `lapply()`, `anova()`, some extraction via `[[` and then searching for the max statistic...without sample data, that's as far as I'm going to go – Chase Sep 21 '14 at 01:12
  • Could you be a little more specific please? – Data noobie Sep 21 '14 at 01:17
  • 1
    What version of `anova()` accepts column names like that? Try to make an actual [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Sep 21 '14 at 02:29

1 Answers1

1

If ddf is your data frame having all the columns (mtcars as an example here), try:

ddf = mtcars
maxfval=0; a=1; b=1
len= length(ddf)
for(i in 1:len) for(j in 1:len){
    if(i!=j){
        fval = anova(aov(ddf[,i]~ddf[,j]))$F[1]
        if(fval>maxfval) {maxfval=fval; a=i;b=j}
    }
}

cat('\nMax F value=',maxfval, '\nWith columns=',a,',',b,'\n')

Output:

Max F value= 130.9989 
With columns= 3 , 2 
rnso
  • 23,686
  • 25
  • 112
  • 234