I want to apply a function to a subset of my dataframe. Let this function be CrossTable()
from {gmodels} which gives you a crosstab for two categorial variables. My question is not specifically about that function though, and ideally the same solution should apply to any other function, too, such as table()
.
Now, I know how to subset dataframes, save the output and work with it, but what if I wanted to do all of this in one short step?
Here's my data and here's what I tried:
mydata <- data.frame(var1=c(rep(1:3,5)),
var2=c(5,1,1,4,2,3,5,2,2,5,1,2,4,1,1))
library(gmodels)
CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") # For the whole dataset
if (mydata$var1>1) CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
The if condition yields the warning "the condition has length > 1 and only the first element will be used", and I assume this is because for some reason if (condition) statement
cannot be applied to vectors from dataframes. Is that correct? In STATA, where you can just type if var ==x
this seems to work very differently.
library(tidyverse)
mydata %>% filter(var>1) %>% CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
This is already plan B, and I would really like to go with plan A, but neither does this tidyverse solution seem to do the trick, because CrossTable()
like so many other functions (such as table()
) cannot handle tidyselect objects.
CrossTable(mydata$var1[mydata$var1>1], mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
This is plan C, and in that very order, this is my least favored option. So it's a good thing this doesn't work either, because obviously it produces two vectors of different length: var1
will be shorter than var2
by five observations
Does anyone have a solution or maybe even multiple solutions? Can anyone tell me how to make plan a through c work? That would be great!