6

Tried to reproduce the result on a SO question: dplyr: How to apply do() on result of group_by?

Here is the data

person = c('Grace', 'Grace', 'Grace', 'Rob', 'Rob', 'Rob')
foods = c('apple', 'banana', 'cucumber', 'spaghetti', 'cucumber', 'banana')
eaten <- data.frame(person, foods, stringsAsFactors = FALSE)

Result that I was trying to replicate is:

[[1]]
     [,1]     [,2]       [,3]      
[1,] "apple"  "apple"    "banana"  
[2,] "banana" "cucumber" "cucumber"

[[2]]
     [,1]        [,2]        [,3]      
[1,] "spaghetti" "spaghetti" "cucumber"
[2,] "cucumber"  "banana"    "banana" 

The original code producing the result above is as below which no longer works:

> eaten %>% group_by(person) %>% do(function(x) combn(x$foods, m = 2))
Error: Results are not data frames at positions: 1, 2

Tried several ways to use do() function to no avail.

> eaten %>% group_by(person) %>% do(combn(.$foods, m = 2))
Error: Results are not data frames at positions: 1, 2

> eaten %>% group_by(person) %>% do(.$foods, combn, m =2)
Error: Arguments to do() must either be all named or all unnamed

> eaten %>% group_by(person) %>% do((combn(.$foods, m=2)))
Error: Results are not data frames at positions: 1, 2

Seems only the one below works with warning message though:

> eaten %>% group_by(person) %>% do(as.data.frame(combn(.$foods, m = 2)))
#   person        V1        V2       V3
# 1  Grace     apple     apple   banana
# 2  Grace    banana  cucumber cucumber
# 3    Rob spaghetti spaghetti cucumber
# 4    Rob  cucumber    banana   banana
# Warning messages:
# 1: In rbind_all(out[[1]]) : Unequal factor levels: coercing to character
# 2: In rbind_all(out[[1]]) : Unequal factor levels: coercing to character

Believe there must a change on the behavior of do() under new version. What are the changes? What is the right idiom / way to use do()? Thanks.

EDIT: Installed latest dplyr and run the code suggested by @hadley

packageVersion("dplyr")
[1] ‘0.3.0.2’

eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2))
# Source: local data frame [2 x 2]
# Groups: <by row>
#   
#   person          x
# 1  Grace <chr[2,3]>
# 2    Rob <chr[2,3]>

EDIT2: Need to extract column "x" as suggested by @hadley

eaten2 <- eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2))
eaten2[["x"]]
# [[1]]
# [,1]     [,2]       [,3]      
# [1,] "apple"  "apple"    "banana"  
# [2,] "banana" "cucumber" "cucumber"
# 
# [[2]]
# [,1]        [,2]        [,3]      
# [1,] "spaghetti" "spaghetti" "cucumber"
# [2,] "cucumber"  "banana"    "banana" 
Community
  • 1
  • 1
KFB
  • 3,501
  • 3
  • 15
  • 18
  • I only tested in dplyr 0.2 and got the same warning about unequal factor levels. To get rid of those (at least in 0.2), you can just modify your `do` to: `do(as.data.frame(combn(.$foods, m = 2), stringsAsFactors = FALSE ))` - Hope it helps – talat Oct 13 '14 at 10:46
  • That looks quite non-idiomatic and strange to have that stringsAsFactors argument in do() again. Anyhow, tried. Did solve the problem. However, would like to learn if there s proper idiom to use do() and why such behaviour change (or actually no change)? – KFB Oct 13 '14 at 11:00
  • You need to name the argument: `eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2))` – hadley Oct 13 '14 at 14:10
  • @hadley, it does not work. – KFB Oct 13 '14 at 14:15
  • @hadley, I posted the result of the code into the post. – KFB Oct 13 '14 at 14:22
  • 1
    @KFB extract the `x` column, and you'll get what you want. – hadley Oct 15 '14 at 20:12

2 Answers2

2

Move EDIT2 in Q to answer to close the question:

For latest dplyr 0.3.0.2+, need to extract column "x" as suggested by @hadley

eaten2 <- eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2))
eaten2[["x"]]
# [[1]]
# [,1]     [,2]       [,3]      
# [1,] "apple"  "apple"    "banana"  
# [2,] "banana" "cucumber" "cucumber"
# 
# [[2]]
# [,1]        [,2]        [,3]      
# [1,] "spaghetti" "spaghetti" "cucumber"
# [2,] "cucumber"  "banana"    "banana
KFB
  • 3,501
  • 3
  • 15
  • 18
  • With magrittr 1.5 you could also do `eaten %>% group_by(person) %>% do(x = combn(.$foods, m = 2)) %$% x` – talat Dec 30 '14 at 14:10
0

Obviously this is a matter of preference/what the data are for, but I think one of the possibilities above is really clever for producing a usable, tidy data frame. Using tidyr::gather, I feel this returns an object that makes clear who ate what in which meal without extracting anything.

person = c( 'Grace', 'Grace', 'Grace', 'Rob', 'Rob', 'Rob' )
foods   = c( 'apple', 'banana', 'cucumber', 'spaghetti', 'cucumber', 'banana' )
eaten <- data.frame(person, foods, stringsAsFactors = FALSE)
eaten %>% group_by(person) %>% do(as.data.frame(combn(.$foods, m = 2))) %>% gather(meal, foods, -1)

returns

# Groups:   person [2]
   person meal  foods    
   <chr>  <chr> <chr>    
 1 Grace  V1    apple    
 2 Grace  V1    banana   
 3 Rob    V1    spaghetti
 4 Rob    V1    cucumber 
 5 Grace  V2    apple    
 6 Grace  V2    cucumber 
 7 Rob    V2    spaghetti
 8 Rob    V2    banana   
 9 Grace  V3    banana   
10 Grace  V3    cucumber 
11 Rob    V3    cucumber 
12 Rob    V3    banana   
> 
Michael Roswell
  • 1,300
  • 12
  • 31