In R, I want to split a data frame along a factor variable, and then apply a function to the data pertaining to each level of that variable. I want to do all of this inside my function. Somehow, the data aren't being split?
I don't understand all of the nuances of passing arguments to functions nested within other functions. I had originally tried to do this with dplyr, but was unable to pass the arguments to dplyr nested within my function.
Here's my function:
myFun <- function(dat, strat.var, PSU, var1){
strata <- as.character(unique(dat[, strat.var]))
N.h <- length(strata)
sdat <- with(dat, split(dat, strat.var))
fun1 <- function(x){ length(unique(x[, PSU])) }
fun2 <- function(x){ sum(tapply(x[, var1], x[, PSU], mean)) }
ns <- sapply(sdat, fun1)
mns <- sapply(sdat, fun2)
dfx <- data.frame(cbind(stratum=strata, ns=ns, mns=mns))
return(list(N.h = N.h, out=dfx))
}
To demonstrate I use the warpbreaks data, but my actual data set has 8 levels of "strat.var" and nested within those are between 2 and 10 levels of "PSU".
myFun(dat=warpbreaks, strat.var="wool", PSU="tension", var1="breaks")
# $N.h
# [1] 2
# $out
# stratum ns mns
# 1 A 3 84.4444444444444
# 2 B 3 84.4444444444444
But this isn't correct, because doing it by hand I get:
sdat <- with(warpbreaks, split(warpbreaks, wool))
fun1 <- function(x, PSU){ length(unique(x[, PSU])) }
fun2 <- function(x, PSU, var1){ sum(tapply(x[, var1], x[, PSU], mean)) }
sapply(sdat, fun1, PSU="tension")
# A B
# 3 3
sapply(sdat, fun2, PSU="tension", var1="breaks")
# A B
# 93.11111 75.77778
I'm using sapply()
because of posts like this one and this one. And I'm not using subset()
because I couldn't get it to work. I'm aslo open to any suggestions using dplyr()
.
Thanks in advance for any and all help!