-1

So I have done this analysis in SAS already and am trying to replicate it in R but I am new in R, I know virtually nothing right now. I have tried a bunch of things but seem to get an error everywhere I go. I will try to simply things because I figure if I can make it work on a small scale I can extrapolate it to a larger scale.

Basically I have this huge data set with subjects that all have a value for a Metabolite. I want to run an ANOVA test on ALL these metabolites, there are 600+ of them. I want to find their P-values and put them all into a nice table with the Metabolite label and the p-value. Here is an example of what the data could look like.

Subject #   Treatment  Antibiotic  Metabolite1  Metabolite2.... Metabolite600
MG_1         MD         No           1.257        2.578               5.12
MG_2         MS         1SS          3.59          1.052              1.5201
MG_3         MD1SS      No           1.564         1.7489             1.310
etc...

I know I can run:

fit1 <- aov(Metabolite1 ~ TREATMENT * ANTIBIOTIC, data=data1)

to calculate it for just the first Metabolite. I am trying to do a For loop just to try it out. Basically I want to know if i can use the AOV function without having to type or copy/paste it and type in 1 to 600 for everything.

In SAS I could write a macro variable and assign it a number so that when I make a name i could simply say Metabolite&i for the y value and fit&i to save the results. Is there any way to do this in R?

I've tried doing Metabolite[i] with a For (i in 1:20) but that doesn't work. Is there any way to actually reference the i in a loop? What is the proper syntax if there is?

Edit: I really don't know how to make this any simpler than it is, my data set is huge, I literally only have about 3 lines of code right now.

library(gdata)

testing = read.xls("~data1", sheet=1)

fit1 <- aov(Metabolite1 ~ TREATMENT * ANTIBIOTIC, data=data1)

summary(fit1)

This is literally all I have. As I mentioned above I tried doing

For (i in 1:20) {
fit[i] <- aov(Metabolite[i] ~ TREATMENT * ANTIBIOTIC, data=data1)
}

which does NOT work. It will just say object Metabolite not found. It totally ignores the my reference to the i value. I am just trying to start small at first.

Leon
  • 11
  • 1
  • 1
    Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Nov 10 '16 at 21:39
  • 1
    (1) You should stop trying to fit models and spend some time learning about basic data structures, flow control and functions in R. e.g. `paste(paste0("Metabolite",i)," ~ TREATMENT * ANTIBIOTIC")` you can build formulas from strings. (2) A better long term approach would be to reshape your data to collapse all the metabolite variables into a single column, and then you can do this sort of thing compactly via things like `lapply`. – joran Nov 10 '16 at 22:00

1 Answers1

1

It's difficult to debug the following code without data, but I would try something like the following:

library(tidyverse)
library(broom)

data_nested <- data1 %>% gather(key = MetaboliteType, value = Metabolite, 
-Subject, -Treatment, -Antibiotic) 
%>% group_by(MetaboliteType) %>% nest()

aov_fun <- function(df) {
aov(Metabolite ~ Treatment * Antibiotic, data = df)
}

(results <- data_nested %>% mutate(fit = map(data, aov_fun), tidy = map(fit, tidy)) 
%>% unnest(tidy))
Phil
  • 7,287
  • 3
  • 36
  • 66