Consider dat1
created here:
set.seed(123)
dat1 <- data.frame(Region = rep(c("r1","r2"), each = 100),
State = rep(c("NY","MA","FL","GA"), each = 10),
Loc = rep(c("a","b","c","d","e","f","g","h"),each = 5),
ID = rep(c(1:10), each = 2),
var1 = rnorm(200),
var2 = rnorm(200),
var3 = rnorm(200),
var4 = rnorm(200),
var5 = rnorm(200))
dat1
has measurements for 5 variables, and observations (ID
s) can be grouped according to 3 grouping variables: Loc
, State
, and Region
I am having to perform various tasks on each response variable/grouping variable combination, so I have been writing functions to make it easier, and keep my analysis tidy. I am using the rstatix
package to do several operations. The following function will conduct a Kruskal Wallis test on the data I specify, calculate the effect size efsz
and return the results in a single data frame res
:
library(rstatix)
KruskTest <- function(dat, groupvar, var){
kt <- dat%>%kruskal_test(get(var) ~ get(groupvar))
efsz <- dat%>%kruskal_effsize(get(var) ~ get(groupvar))
res <<- cbind(kt, efsz[,3:5])
res[1,1] <<- var
res$groupvar <<- groupvar
res <<- res[,c(10,1:9)]
}
KruskTest(dat=dat1, groupvar = "Region", var = "var1")
Now I can use that function to loop over each response variable and get the results for a grouping variable (example shows it for Region
) in a single data frame, which is what I need:
vars <- paste(names(dat1[,5:9]))
a <- data.frame()
for(i in vars){
KruskTest(dat=dat1, groupvar="Region", var= i)
a <- rbind(a, res)
}
That works great for the Kruskal Wallis test, now I want to make a very similar function that will do a duns test, but watch what happens:
dunn <- function(dat, groupvar, var){
res <<- dat%>%rstatix::dunn_test(get(var) ~ get(groupvar), p.adjust.method = "bonferroni")
}
dunn(dat=dat1, groupvar="Region", var = "var1")
r:Error: Can't extract columns that don't exist. x The column `get(groupvar)` doesn't exist.
Outside of a user-written function, you specify data for the dunn_test()
and kruskal_test()
the exact same way. So what is the difference between specifying variables in these two funcitons, and why does the first one work but not the second?