I have found a few stackoverflow questions very similar but the answers are not what I am looking for (Loop through columns and apply ddply, Aggregate / summarize multiple variables per group (i.e. sum, mean, etc))
The main difference is the answers simplify their problems in a way that does not use a for loop (nor apply) but uses aggregate (or similar) instead. However I have a large chunk of code working smoothly to do various summaries, stats, and plots, so what I would really like to do is get a loop or function working. The issue I am currently facing is going from the column name stored as q in the loop to the actual column (get() is not working for me). See below.
My data set is similar to below but with 40 features:
Subject <- c(rep(1, times = 6), rep(2, times = 6))
GroupOfInterest <- c(letters[rep(1:3, times = 4)])
Feature1 <- sample(1:20, 12, replace = T)
Feature2 <- sample(400:500, 12, replace = T)
Feature3 <- sample(1:5, 12, replace = T)
df.main <- data.frame(Subject,GroupOfInterest, Feature1, Feature2,
Feature3, stringsAsFactors = FALSE)
My attempts so far have used a for loop:
Feat <- c(colnames(df.main[3:5]))
for (q in Feat){
df_sum = ddply(df.main, ~GroupOfInterest + Subject,
summarise, q =mean(get(q)))
}
Which I hope to provide an output like below (although I realize the way it is now a separate merge function would be needed) :
However depending on how I do it I either get an error ("Error in get(q) : invalid first argument") or it averages all values of a feature rather than grouping by Subject and GroupOfInterest.
I have also tried using lists and lapply but am running into similar difficulties.
From what I have gathered my issue lies in that ddply is expecting Feature1. But if I loop through I am either providing it with "Feature1" (string) or (1,14,14,16,17...) which no longer is part of the dataframe which is needed to group by the Subject and Group.
Thanks so much for any help you can offer with solving this problem and teaching me how this process works.