0

I want to apply a function to a subset of my dataframe. Let this function be CrossTable() from {gmodels} which gives you a crosstab for two categorial variables. My question is not specifically about that function though, and ideally the same solution should apply to any other function, too, such as table().

Now, I know how to subset dataframes, save the output and work with it, but what if I wanted to do all of this in one short step?

Here's my data and here's what I tried:

mydata <- data.frame(var1=c(rep(1:3,5)),
                     var2=c(5,1,1,4,2,3,5,2,2,5,1,2,4,1,1))

library(gmodels)
CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") # For the whole dataset

if (mydata$var1>1) CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") 

The if condition yields the warning "the condition has length > 1 and only the first element will be used", and I assume this is because for some reason if (condition) statement cannot be applied to vectors from dataframes. Is that correct? In STATA, where you can just type if var ==x this seems to work very differently.

library(tidyverse)
mydata %>% filter(var>1) %>% CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") 

This is already plan B, and I would really like to go with plan A, but neither does this tidyverse solution seem to do the trick, because CrossTable() like so many other functions (such as table()) cannot handle tidyselect objects.

CrossTable(mydata$var1[mydata$var1>1], mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") 

This is plan C, and in that very order, this is my least favored option. So it's a good thing this doesn't work either, because obviously it produces two vectors of different length: var1 will be shorter than var2 by five observations

Does anyone have a solution or maybe even multiple solutions? Can anyone tell me how to make plan a through c work? That would be great!

Dr. Fabian Habersack
  • 1,111
  • 12
  • 30

2 Answers2

2

Another way could be,

with(mydata[mydata$var1 > 1,], CrossTable(var1, var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS"))
Joe
  • 645
  • 3
  • 17
1

Ideal case, would be to subset the data and use the data in the function that you want to use

mydf <- subset(mydata, var1 > 1)
CrossTable(mydf$var1, mydf$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")

The if condition doesn't subset the data it just checks for the condition.

If you don't want to subset the data and do that in one go, you could filter the values from both the terms

CrossTable(mydata$var1[mydata$var1 > 1], mydata$var2[mydata$var1 > 1], digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")

Or using dplyr, we could do

library(dplyr)
mydata %>% 
  filter(var1 > 1) %>%
  {CrossTable(.$var1, .$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")}
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • OK thanks that works and I know it would be better to do this in two steps. Thank you also for fixing my plan b and c. But can you think of a way to make the if statement work as well? Or is this impossible? Why can't I let R check for a condition to be true and then on the basis of that apply the function? – Dr. Fabian Habersack Sep 14 '19 at 13:37
  • 1
    @FabianHabersack `if` condition can only check for the condition and not subset the data, to subset you might need to use another function. If you want R to check for condition to be true and then apply the function you can use my 1st answer or even Joe's answer which subsets the data in one line. – Ronak Shah Sep 14 '19 at 13:44