forloop/foreach to run each var in data using same model in r

Question

My data looks like this

q3a  q3b q3c q3d q3e ... q10d grp
1    2   3   4   5  ...  1    1
2    1   2   3   4       2    1
3    2   1   5   2  ...  1    2
2    1   2   1   1  ...  2    2 
2    3   4   1   2  ...  3    3

I want to run one-way anova and duncan post hoc test for each question. For q3a, the code would be

library("DescTools")
q3a <- aov(q3a ~ grp,data = pgy)
PostHocTest(q3a,method = "duncan")

How can I write a foreach loop to iterate the same models for each variable in my data?

## List of variables:
> dput(names(duncan))
c("q3a", "q3b", "q3c", "q3d", "q3e", "q4a", "q4b", "q4d", "q4e", 
"q4f", "q4g", "q4h", "q4i", "q4j", "q4k", "q4l", "q4m", "q5b", 
"q5c", "q5d", "q6b", "q6c", "q6f", "q7b", "q7d", "q7f", "q7g", 
"q7h", "q7i", "q8a", "q8b", "q9a", "q9b", "q9c", "q9d", "q10a", 
"q10b", "q10c", "q10d")

Thanks!

tbradley · Answer 1 · 2018-01-17T19:30:37.257

This is not a method using foreach but rather it is a method using the dplyr package and the purrr package. This method will allow you to extend what is already done for further analyses.

# first load the packages
library(dplyr)
library(purrr)

# read in data
pgy <- read.table(text = "
1    2   3   4   5  1    1
2    1   2   3   4  2    1
3    2   1   5   2  1    2
2    1   2   1   1  2    2 
2    3   4   1   2  3    3", 
                  col.names = c("q3a", "q3b", "q3c", 
                                "q3d", "q3e", "q10d", "grp"))

# data manipulation to get dataframe in required format
results <- pgy %>% 

# use this mutate call to convert grp to factor
# you may not need to do this if that column is already a factor
mutate(grp = factor(grp)) %>% 

# this gather function will convert the table from wide to long
# this is necessary to do the group_by() %>% nest() combo
# the q3a:q10d should work for you if that is the range of questions
# if not then you can change it to the first and last column names
# containing question data - these should be unquoted
gather(key = "question", value = "result", q3a:q10d) %>% 

# this creates groupings based on which question
group_by(question) %>% 

# this creates nested dataframes for each group - known as list-columns
# the column name of the nested dataframes will be data unless 
# explicitly specified
nest() %>% 

# this is where it gets a little tricky
# you can use a mutate call to create a new column, but since your
# data for each column is now stored in a nested list, you need to use
# map from the purrr package to operate on it.
# the ~ symbol is a short cut to allow you to avoid writing an anonymous function
# the .x in the aov function represents the data object in the map call
# notice that data is also the name of the listed column 
mutate(aov_test = map(data, ~ aov(result ~ grp, data = .x)),

       # this is the real magic of this method, you can use the variable created 
       # in the first line on the mutate function as the input to the 
       # second line for your PostHocTest input
       duncan = map(aov_test, ~DescTools::PostHocTest(x = .x, method = "duncan")))

You can then access the results of the duncan test like this:

results$duncan[[1]]

or to view them all:

map(1:nrow(results), ~ results$duncan[[.x]])

For the aov test, you can use the broom package to get the results in a tidy data frame like this:

results %>%
  unnest(map(aov_test, broom::tidy), .drop = TRUE)

You can check out the other major broom functions (augment() and glance()) to see other info about the model. Unfortunately, it looks like there is no tidier for the duncan test you are doing.

Thanks for this detailed solution! I just found that the problem can be easily solved by using `lapply` — mandy, Jan 17 '18 at 19:44
I'm glad you found a solution that worked! Just so you know, the `map` function is very similar to `lapply` (if not the same thing). If this is as far as you have to go then the `lapply` method may be easier, but if you have to extend to more models than this method can be very useful. — tbradley, Jan 17 '18 at 19:51

score 0 · Answer 2 · answered Jan 17 '18 at 19:49

Here is a neat solution for my problem using lapply:

## list of names of my data 
> dput(names(duncan))
c("grp", "q3a", "q3b", "q3c", "q3d", "q3e", "q4a", "q4b", "q4d", 
"q4e", "q4f", "q4g", "q4h", "q4i", "q4j", "q4k", "q4l", "q4m", 
"q5b", "q5c", "q5d", "q6b", "q6c", "q6f", "q7b", "q7d", "q7f", 
"q7g", "q7h", "q7i", "q8a", "q8b", "q9a", "q9b", "q9c", "q9d", 
"q10a", "q10b", "q10c", "q10d")

## use lapply to run model for each variable 
results <- lapply(duncan[,-1], function(x) PostHocTest(aov(x ~ pgy, data = duncan), method = "duncan"))

## extract results for each variable
results$q3a

  Posthoc multiple comparisons of means : Duncan's new multiple range test 
    95% family-wise confidence level

$pgy
                 diff      lwr.ci     upr.ci   pval    
PGY2-PGY1  0.10716048  0.04104755 0.17327342 0.0012 ** 
PGY3-PGY1  0.05197694 -0.01485439 0.11880828 0.1274    
PGY3-PGY2 -0.05518354 -0.12593368 0.01556661 0.1263    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

results$q3b
etc..

I found a similar solution here: Linear Regression loop for each independent variable individually against dependent

forloop/foreach to run each var in data using same model in r

2 Answers2