1

I have a data frame with four columns: two columns indicate participation in a sport, while the other two columns indicate whether the player passed each of their two fitness exams.

dat <- data.frame(SOCCER = sample(0:1, 10, replace = T),
                  BASEBALL = sample(0:1, 10, replace = T),
                  TEST_1_PASS = sample(0:1, 10, replace = T),
                  TEST_2_PASS = sample(0:1, 10, replace = T))

I would like to obtain a list containing contingency tables for each sport and exam. I know that I can accomplish this using the following code, which uses nested lapply statements, but this strikes me as inefficient. Can anyone propose a more elegant solution that doesn't use nesting?

results <- lapply(c("SOCCER", "BASEBALL"), function(x) {
  lapply(c("TEST_1_PASS", "TEST_2_PASS"), function(y){
    table(sport = dat[[x]], pass = dat[[y]])
  })
})

Thanks as always!

DJC
  • 1,491
  • 6
  • 19
  • do you necessarily want the contigency tables or the mean and number of success per sport? – Gallarus Feb 06 '20 at 17:21
  • Contingency tables specifically. Need them for a whole heap of fisher tests – DJC Feb 06 '20 at 17:26
  • Also, where is `PLAYER` indicator? Please `dput` actual data as advised: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451). – Parfait Feb 06 '20 at 17:32
  • I don't think your nesting is inefficient... you need to call table 4 times. It doesn't really matter if that's inside one loop/lapply over 4 items or 2 nested loops/lapplys with 2 items each. – Gregor Thomas Feb 06 '20 at 17:33
  • You could clean up the syntax a little if you switched to `purrr` (e.g., use `~` for anonymous functions, and you'd get nicer names in the resulting lists using `purrr::map` instead of `lapply`), but it'd basically be the same. – Gregor Thomas Feb 06 '20 at 17:34

2 Answers2

4

The double lapply gets all pairwise combinations of the columns in each of the columns' vectors, like @Gregor wrote in a comment

I don't think your nesting is inefficient... you need to call table 4 times. It doesn't really matter if that's inside one loop/lapply over 4 items or 2 nested loops/lapplys with 2 items each.

But here is another way, with one of the loops in disguise as expand.grid.

cols <- expand.grid(x = c("SOCCER", "BASEBALL"), 
                    y = c("TEST_1_PASS", "TEST_2_PASS"),
                    stringsAsFactors = FALSE)
Map(function(.x, .y)table(dat[[.x]], dat[[.y]]), cols$x, cols$y)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

Consider re-formatting your data into long format (i.e., tidy data), merging the two data pieces of sport and exam, then run by (a rarely used member of apply family as object-oriented wrapper to tapply) for all subset combinations between the two returning a named header report of results:

# RESHAPE EACH DATA SECTION (SPORT AND EXAM) INTO LONG FORMAT
df_list <- lapply(list(c("SOCCER", "BASEBALL"), c("TEST_1_PASS", "TEST_2_PASS")), function(cols)
                   reshape(cbind(PLAYER = row.names(dat), dat[cols]), 
                           varying = cols, v.names = "VALUE", 
                           times = cols, timevar = "INDICATOR", 
                           idvar = "PLAYER", ids = NULL,
                           new.row.names = 1:1E4, direction = "long")
           )

# CROSS JOIN (ALL COMBINATION PAIRINGS)
final_df <- Reduce(function(x,y) merge(x, y, by="PLAYER", suffixes=c("_SPORT", "_EXAM")), df_list)
final_df

# RUN TABLES FOR EACH SUBSET COMBINATION
tables_list <- with(final_df, by(final_df, list(INDICATOR_SPORT, INDICATOR_EXAM), function(sub)
                                  table(sport = sub$VALUE_SPORT, pass = sub$VALUE_EXAM)
                                )
                   )    

Output

tables_list

# : BASEBALL
# : TEST_1_PASS
#      pass
# sport 0 1
#     0 3 4
#     1 2 1
# ------------------------------------------------------------ 
# : SOCCER
# : TEST_1_PASS
#      pass
# sport 0 1
#     0 2 0
#     1 3 5
# ------------------------------------------------------------ 
# : BASEBALL
# : TEST_2_PASS
#      pass
# sport 0 1
#     0 3 4
#     1 1 2
# ------------------------------------------------------------ 
# : SOCCER
# : TEST_2_PASS
#      pass
# sport 0 1
#     0 2 0
#     1 2 6

Online Demo

Parfait
  • 104,375
  • 17
  • 94
  • 125