Issue: I'm having trouble getting Dplyr's summarize function to recognize that the index variable is a column name.
for (i in colnames(df)){
temp_frame <- df%>%
select(group, i)%>%
group_by(group)%>%
summarize(yes = sum(i == "1"), no = sum(i=="0"))
}
Full project: I have a data frame of variables I'm attempting run stats on. Because it's so long, I'm trying to create a for loop that takes the independent variable column (group) and each dependent variable column and creates a bunch of tables that are properly formatted to run a fisher's exact test on. Without a for loop, the code for the fisher's exact runs perfectly, but once I try to make the tables in the for loop the summarize function doesn't seem to understand that i is a column name within the temp_frame.
Here's the full code with some stand-in original data
#create starting data frame
df <- data.frame(
group= c("control", "control", "control", "experimental", "experimental", "experimental"),
a = c(1,0,1,1,1,0),
b = c(0,1,0,0,1,1)
)
#create empty stats data frame
stats <- data.frame(name = "", fish="")
for (i in colnames(df)){
if (i == group){ #skip over the "group" column to avoid trying to make a group vs group test
next
} else{
temp_frame <- df%>%
select(group, i)%>%
group_by(group)%>%
summarize(yes = sum(i == "1"), no = sum(i=="0"))%>%
select(-group)
stats[nrow(stats) + 1,] = c(i, fisher.test(temp_frame)$p.value) # add each p value to the stats data frame
}
}
Outside of the for loop, were I to just specify the a column in the summarize function, the temp_frame would first look like this, which is exactly what I want:
yes | no |
---|---|
2 | 1 |
2 | 1 |
but instead, I'm just getting
yes | no |
---|---|
0 | 0 |
0 | 0 |
I think it's because it's not recognizing that a is a column name and instead, just giving me the output as if a was a string.
How do I tell it that i represents a column name within the temp_frame df?