20

I'd like to use dplyr to group a table by one column, then apply a function to the set of values in the second column of each group.

For instance, in the code example below, I'd like to return all of the 2-item combinations of foods eaten by each person. I cannot figure out how to properly supply the function with the proper column (foods) in the do() function.

library(dplyr)

person = c( 'Grace', 'Grace', 'Grace', 'Rob', 'Rob', 'Rob' )
foods   = c( 'apple', 'banana', 'cucumber', 'spaghetti', 'cucumber', 'banana' )
eaten  = data.frame(person, foods)

by_person = group_by(eaten, person)

# How to do this?
do( by_person, combn( x = foods, m = 2 ) )

Note that the example code in ?do fails on my machine

mods <- do(carriers, failwith(NULL, lm), formula = ArrDelay ~ date)
zimmeee
  • 333
  • 1
  • 2
  • 10
  • After mucking around I saw that this question was re-asked and answered given evolution of `dplyr` https://stackoverflow.com/q/26336180/8400969 – Michael Roswell Jun 28 '19 at 16:17

1 Answers1

15

Let us define eaten like this:

eaten <- data.frame(person, foods, stringsAsFactors = FALSE)

1) Then try this:

eaten %.% group_by(person) %.% do(function(x) combn(x$foods, m = 2))

giving:

[[1]]
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

[[2]]
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana"  

2) To be able to do something near to what @Hadley describes in the comments without waiting for a future version of dplyr try this where do2 is found here:

library(gsubfn)
eaten %.% group_by(person) %.% fn$do2(~ combn(.$foods, m = 2))

giving:

$Grace
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

$Rob
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana"  

Note: The last line of the question giving the code in the help file also fails for me. This variation of it works for me: do(jan, lm, formula = ArrDelay ~ date) .

Community
  • 1
  • 1
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • 6
    In a future version of dplyr you'll be able to do something more like `do(combn(.$foods, m = 2))` and the components will automatically with useful names. – hadley Mar 04 '14 at 21:35
  • Thank you so much for the helpful solution! Minor typo in stringsAsFactors in the first line. – zimmeee Mar 04 '14 at 21:59
  • introduced a new one this time :) – zimmeee Mar 04 '14 at 22:17
  • I think `dplyr` has changed since this answer (#1) was created. One of those changes was going from `%.%` to `%>%`. I fiddled a bit, and the following code produced a tibble with basically the desired output, I think: `eaten %>% group_by(person) %>% do(k=combn(.$foods, m = 2))` Here, k is the column in the tibble where each element is a matrix, where each column has two unique food types possibly eaten by that person, and there are three columns for the three unique pairs. – Michael Roswell Jun 28 '19 at 16:08
  • As noted above as well, this was re-visited in a new question: https://stackoverflow.com/q/26336180/8400969 – Michael Roswell Jun 28 '19 at 16:21