0

I wrote a loop in which I iterate through the columns of a given .csv and run an anova and posthoc test. I then combine the results of each into a data frame and export it to a .csv file. However, I cannot get rbind() to build my data.frame. Any help on this? Here is the script:

setwd("~/School/Lab/mice/sugar_study_2015/MG-RAST and Metagenassist/Trimmed/R. CSV")
#Save your Datasheet into variable X
x <- read.csv("T0_B_Class_Anova.csv")

x = x[1:9,]
x[is.na(x)] <- 0

DF.Anova <- data.frame()
DF.Tukey <- data.frame()

#Counts through the columns
for(i in 2:(ncol(x)-1)){
  columns <- names(x[i])
 
##Runs an ANOVA - 'Group' being a grouping factor
  anovaresult <- anova(aov(x[,i]~Group,data=x))
  
  DF.Anova <- rbind(DF.Anova, anovaresult)
 
  ##fix anova into data frame
  Famall = colnames(x)
  Famall = as.data.frame(Famall)
  Famall = Famall[2:83,]
  Famall = as.data.frame(Famall)
  DFanovanames = rep(Famall, each = 2)
  DFanovanames = as.data.frame(DFanovanames)
  #install.packages("tidyr")
  library(tidyr)
  anovanames = data.frame(Names=unlist(DFanovanames, use.names = FALSE))
  o.anovanames = dplyr::arrange(anovanames, Names)
###dont forget to change this**************************
  finalanova_BFT0 = cbind(rn = rownames(DF.Anova), DF.Anova, o.anovanames)
 
##Runs Tukeys Post-hoc test on Anova
  posthocresult <- TukeyHSD(aov(x[,i]~Group,data=x))
 
  DF.Tukey <- rbind(DF.Tukey, posthocresult$Group)
 
  ##fix tukey into data frame
  Famname = colnames(x)
  Famname = as.data.frame(Famname)
  Famname = Famname[2:83,]
  Famname = as.data.frame(Famname)
  DFposthocnames = rep(Famname, each = 3)
  DFposthocnames = data.frame(DFposthocnames)
  #install.packages("tidyr")
  library(tidyr)
  library(dplyr)
  posthocnames = data.frame(Names=unlist(DFposthocnames, use.names = FALSE))
  o.posthocnames = dplyr::arrange(posthocnames, Names)
###dont forget to change this****************************
  finalposthoc_BFT0 = cbind(rn = rownames(DF.Tukey), DF.Tukey, o.posthocnames)
                           
##Prints posthoc results into txt file
  print(columns)
  print(anovaresult)
  print(posthocresult)
}
 
write.csv(finalanova_BFT0, file="testfinalanova_BCT0")
write.csv(finalposthoc_BFT0, file="finalposthoc_BCT0")

You can find the sample .csv here

Haley
  • 119
  • 1
  • 3
  • 16
  • 1
    are you sure that your data is being read in correctly? your `setwd` call looks incorrect – tbradley Oct 20 '17 at 16:30
  • It would be easier to answer your question if you provided a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) rather than linking to the data. – tobiasegli_te Oct 20 '17 at 16:34
  • I have edited the dataset to include 3 columns. I don't want to take out anything from the script because I'm not sure where it's gong wrong. I've stared at this thing for an hour or two and cannot figure it out. – Haley Oct 20 '17 at 16:49

1 Answers1

1

Assuming your desired output is 2 dataframes with the summary results from the two different tests. You can accomplish this with the map function from the purrr package and the tidy function from the broom package. I saved the csv you posted and saved it as anova-question-data.csv. I would recommend verifying that your data is being read in correctly if you are going to use setwd. Here is the code i used to get the two data frames:

# read in the data
df <- read_csv(file = "anova-question-data.csv")

# create a list to loop over in the `map` call. 
loop_list <- colnames(df[,-1])

# create a list of data frames using the `tidy` function from `broom`
anova_list <- map(loop_list, function(x){
  anova_results <- anova(aov(df[[x]]~df[["Group"]]))

  # this tidies the results from the anova test and add a new 
  # column with the column name being tested. 
  # if bacteria is not your desired name, feel free to change it as 
  # it will not affect any of the rest of the code
  output <- broom::tidy(anova_results) %>%
    mutate(bacteria = x)
})

# use `do.call` to bind the dataframes in anova_list together
anova_df <- anova_list %>%
  do.call(rbind, .)

# repeat the exact same process only changing `anova` with `TukeyHSD`
posthoc_list <- map(loop_list, function(x){
  posthoc_results <- TukeyHSD(aov(df[[x]]~df[["Group"]]))

  output <- broom::tidy(posthoc_results) %>%
    mutate(bacteria = x)
 })

posthoc_df <- posthoc_list %>%
  do.call(rbind, .)

This gives you the two following outputs (I am only going to print the first 5 lines):

> head(anova_df, 5)
         term    df        sumsq       meansq  statistic   p.value            bacteria
 1 df[["Group"]]  2 1.265562e-07 6.327809e-08 0.02650174 0.9739597       Acidobacteria
 2     Residuals  6 1.432617e-05 2.387695e-06         NA        NA       Acidobacteria
 3 df[["Group"]]  2 9.332880e-02 4.666440e-02 0.84001916 0.4768300      Actinobacteria
 4     Residuals  6 3.333096e-01 5.555159e-02         NA        NA      Actinobacteria
 5 df[["Group"]]  2 9.114521e-04 4.557261e-04 1.08994816 0.3946484 Alphaproteobacteria


> head(posthoc_df, 5)
           term comparison      estimate     conf.low   conf.high adj.p.value       bacteria
1 df[["Group"]]      HF-CO  2.234233e-04 -0.003647709 0.004094556   0.9829095  Acidobacteria
2 df[["Group"]]     HFS-CO -4.903533e-05 -0.003920168 0.003822097   0.9991677  Acidobacteria
3 df[["Group"]]     HFS-HF -2.724587e-04 -0.004143591 0.003598674   0.9747264  Acidobacteria
4 df[["Group"]]      HF-CO  2.345822e-01 -0.355886402 0.825050849   0.4856694 Actinobacteria
5 df[["Group"]]     HFS-CO  1.907267e-01 -0.399741917 0.781195333   0.6084817 Actinobacteria
tbradley
  • 2,210
  • 11
  • 20