37

There are a couple of issues about this on the dplyr Github repo already, and at least one related SO question, but none of them quite covers my question -- I think.

  • Adding multiple columns in a dplyr mutate call is more or less what I want, but there's a special-case answer for that case (tidyr::separate) that doesn't (I think) work for me.
  • This issue ("summarise or mutate with functions returning multiple values/columns") says "use do()".

Here's my use case: I want to compute exact binomial confidence intervals

dd <- data.frame(x=c(3,4),n=c(10,11))
get_binCI <- function(x,n) {
    rbind(setNames(c(binom.test(x,n)$conf.int),c("lwr","upr")))
}
with(dd[1,],get_binCI(x,n))
##             lwr       upr
## [1,] 0.06673951 0.6524529

I can get this done with do() but I wonder if there's a more expressive way to do this (it feels like mutate() could have a .n argument as is being discussed for summarise() ...)

library("dplyr")
dd %>% group_by(x,n) %>%
    do(cbind(.,get_binCI(.$x,.$n)))

## Source: local data frame [2 x 4]
## Groups: x, n
## 
##   x  n        lwr       upr
## 1 3 10 0.06673951 0.6524529
## 2 4 11 0.10926344 0.6920953
Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 2
    Are you settled to do this particularly with `dplyr`? With, `data.table` you can quickly do `setDT(dd)[, as.list(get_binCI(x, n)), by = .(x, n)]` Though my mind reading skills are not allowing me to determine what do you exactly mean by "*expressive way*"... – David Arenburg Apr 13 '15 at 20:59
  • 4
    This is certainly good. I *was* hoping for a `dplyr` answer (although I will not be surprised if my solution above is the best one can do ATM). I have nothing against `data.table`, but I prefer `dplyr`, and -- mostly -- I'm still spending a lot of brainpower getting my head around it, don't really want to add a whole new set of syntax (nor inflict it on my students and colleagues) at the moment. But if you answer that way I'll upvote, it's useful. – Ben Bolker Apr 13 '15 at 21:01
  • 1
    Hi all, hoping to bump this up; is there now a better way to do this with nesting? I'm trying but haven't gotten it yet. – Aaron left Stack Overflow Oct 05 '17 at 18:51
  • @Aaron, I've had a go at using `unnest` that also uses `map2` that you might be interested in – markdly Oct 30 '17 at 06:08

7 Answers7

19

Yet another variant, although I think we're all splitting hairs here.

> dd <- data.frame(x=c(3,4),n=c(10,11))
> get_binCI <- function(x,n) {
+   as_data_frame(setNames(as.list(binom.test(x,n)$conf.int),c("lwr","upr")))
+ }
> 
> dd %>% 
+   group_by(x,n) %>%
+   do(get_binCI(.$x,.$n))
Source: local data frame [2 x 4]
Groups: x, n

  x  n        lwr       upr
1 3 10 0.06673951 0.6524529
2 4 11 0.10926344 0.6920953

Personally, if we're just going by readability, I find this preferable:

foo  <- function(x,n){
    bi <- binom.test(x,n)$conf.int
    data_frame(lwr = bi[1],
               upr = bi[2])
}

dd %>% 
    group_by(x,n) %>%
    do(foo(.$x,.$n))

...but now we're really splitting hairs.

joran
  • 169,992
  • 32
  • 429
  • 468
18

Yet another option could be to use the purrr::map family of functions.

If you replace rbind with dplyr::bind_rows in the get_binCI function:

library(tidyverse)

dd <- data.frame(x = c(3, 4), n = c(10, 11))
get_binCI <- function(x, n) {
  bind_rows(setNames(c(binom.test(x, n)$conf.int), c("lwr", "upr")))
}

You can use purrr::map2 with tidyr::unnest:

dd %>% mutate(result = map2(x, n, get_binCI)) %>% unnest()

#>   x  n        lwr       upr
#> 1 3 10 0.06673951 0.6524529
#> 2 4 11 0.10926344 0.6920953

Or purrr::map2_dfr with dplyr::bind_cols:

dd %>% bind_cols(map2_dfr(.$x, .$n, get_binCI))

#>   x  n        lwr       upr
#> 1 3 10 0.06673951 0.6524529
#> 2 4 11 0.10926344 0.6920953
markdly
  • 4,394
  • 2
  • 19
  • 27
  • 1
    In dplyr 0.8.5 this will need to be `dd %>% mutate(result = map2(x, n, get_binCI)) %>% unnest(result)`. Also, the help for unnest suggests it is mainly intended for lists of data frames. Alternative approaches are suggested in the help file. – Tony Ladson Mar 30 '20 at 20:27
7

Here's a quick solution using data.table package instead

First, a little change to the function

get_binCI <- function(x,n) as.list(setNames(binom.test(x,n)$conf.int, c("lwr", "upr")))

Then, simply

library(data.table)
setDT(dd)[, get_binCI(x, n), by = .(x, n)]
#    x  n        lwr       upr
# 1: 3 10 0.06673951 0.6524529
# 2: 4 11 0.10926344 0.6920953
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • here is a base solution @David Arenburg!! `dd[, c('lwr','upr')] <- t(mapply(get_binCI, dd[, 1], dd[, 2]))` – rawr Apr 13 '15 at 21:18
  • 7
    @rawr I'm not sure why you are posting this as a comment under my answer :) I would suggest you post this as your own solution (I promise to upvote). – David Arenburg Apr 13 '15 at 21:20
  • @rawr, is `Map()` safer (no simplification)? – Ben Bolker Apr 13 '15 at 21:21
  • @BenBolker but I guess you'd also have to use a `do.call`, though, right? – rawr Apr 13 '15 at 21:24
  • I thought `Map()` and `mapply()` were basically identical: `‘Map’ is a simple wrapper to ‘mapply’ which does not attempt to simplify the result, similar to Common Lisp's ‘mapcar’ (with arguments being recycled, however). ` – Ben Bolker Apr 13 '15 at 21:26
  • @BenBolker the only difference is that mapply's default is `SIMPLIFY = TRUE` and map is false, and also you cannot change map's default obviously – rawr Apr 13 '15 at 21:37
7

Here are some possibilities with rowwise and nesting.

library("dplyr")
library("tidyr")

data frame with repeated x/n combinations, for fun

dd <- data.frame(x=c(3, 4, 3), n=c(10, 11, 10))

a versions of the CI function that returns a data frame, like @Joran's

get_binCI_df <- function(x,n) {
  binom.test(x, n)$conf.int %>% 
    setNames(c("lwr", "upr")) %>% 
    as.list() %>% as.data.frame()
}

Grouping by x and n as before, removes the duplicate.

dd %>% group_by(x,n) %>% do(get_binCI_df(.$x,.$n))
# # A tibble: 2 x 4
# # Groups:   x, n [2]
#       x     n       lwr       upr
#   <dbl> <dbl>     <dbl>     <dbl>
# 1     3    10 0.1181172 0.8818828
# 2     4    11 0.1092634 0.6920953

Using rowwise keeps all the rows but removes x and n unless you put them back using cbind(. (like Ben does in his OP).

dd %>% rowwise() %>% do(cbind(., get_binCI_df(.$x,.$n)))
# Source: local data frame [3 x 4]
# Groups: <by row>
#   
# # A tibble: 3 x 4
#       x     n        lwr       upr
# * <dbl> <dbl>      <dbl>     <dbl>
# 1     3    10 0.06673951 0.6524529
# 2     4    11 0.10926344 0.6920953
# 3     3    10 0.06673951 0.6524529

It feels like nesting could work more cleanly, but this is as good as I can get. Using mutate means I can use x and n directly instead of .$x and .$n, but mutate expects a single value, so it needs to be wrapped in list.

dd %>% rowwise() %>% mutate(ci=list(get_binCI_df(x, n))) %>% unnest()
# # A tibble: 3 x 4
#       x     n        lwr       upr
#   <dbl> <dbl>      <dbl>     <dbl>
# 1     3    10 0.06673951 0.6524529
# 2     4    11 0.10926344 0.6920953
# 3     3    10 0.06673951 0.6524529

Finally, looks like something like this is an open issue (as of 5 Oct 2017) for dplyr; see https://github.com/tidyverse/dplyr/issues/2326; if something like that is implemented then that will be the easiest way!

Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
5

This uses a "standard" dplyr workflow, but as @BenBolker notes in the comments, it requires calling get_binCI twice:

dd %>% group_by(x,n) %>%
  mutate(lwr=get_binCI(x,n)[1],
         upr=get_binCI(x,n)[2])

  x  n        lwr       upr
1 3 10 0.06673951 0.6524529
2 4 11 0.10926344 0.6920953
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Yes, it's a solution, but the ugliness of this is having to call `get_binCI()` twice. Sort of in the eye of the beholder as to whether this is better or worse than `do(cbind(.,data.frame(get_binCI(.$x,.$n)))` (I could get rid of the `data.frame()` by sticking it inside `get_binCI`) – Ben Bolker Apr 13 '15 at 21:11
  • I agree. I was just trying to find something with dplyr that would work without calling `do`. – eipi10 Apr 13 '15 at 21:12
3

Old question (with plenty of good answers), but this is a great use case for tidyverse's broom package, which deals with tidying output from test and modeling objects (such as binom.test, lm, etc).

It's more verbose than other methods, but I think it matches your desire for a more expressive approach.

The process is:

  1. Define the groups that you'll run binom.test on (in this case, those groups are defined by x and n) and nest them, creating separate data.frames for each (within the full data.frame)
  2. map the binom.test call to the x and n values from each group
  3. tidy the binom.test output for each group (this is where broom comes in)
  4. unnest the tidied test output data.frames into the full data.frame

Now you're left with a data.frame where each row contains the x and n values, combined with all of the output from the corresponding binom.test, neatly formatted with separate columns for each bit of output information (point estimate, upper/lower conf, p-value, etc).

library(tidyverse)
library(broom)
dd <- data.frame(x=c(3,4),n=c(10,11))
dd %>%
  group_by(x, n) %>%
  nest() %>%
  mutate(test = map(data, ~tidy(binom.test(x, n)))) %>%
  unnest(test)
#> # A tibble: 2 x 11
#> # Groups:   x, n [2]
#>       x     n data  estimate statistic p.value parameter conf.low conf.high
#>   <dbl> <dbl> <lis>    <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
#> 1     3    10 <tib…    0.3           3   0.344        10   0.0667     0.652
#> 2     4    11 <tib…    0.364         4   0.549        11   0.109      0.692
#> # … with 2 more variables: method <chr>, alternative <chr>

From here you can get to your exact desired format with just a bit more manipulation, selecting the desired output variables, and renaming them:

dd %>%
  group_by(x, n) %>%
  nest() %>%
  mutate(test = map(data, ~tidy(binom.test(x, n)))) %>%
  unnest(test) %>%
  rename(lwr = conf.low, upr = conf.high) %>%
  select(x, n, lwr, upr)
#> # A tibble: 2 x 4
#> # Groups:   x, n [2]
#>       x     n    lwr   upr
#>   <dbl> <dbl>  <dbl> <dbl>
#> 1     3    10 0.0667 0.652
#> 2     4    11 0.109  0.692

As mentioned, it's verbose. Much more so than (for example) @joran's beautifully succinct

dd %>% 
    group_by(x,n) %>%
    do(foo(.$x,.$n))

However, the benefit of the broom approach is that you won't need to define a function foo (or get_binCI). It's fully self-contained, and in my opinion far more expressive and flexible.

RyanFrost
  • 1,400
  • 7
  • 17
  • Great and up-to-date answer. If compared to the OP's chosen answer, which would be faster (supposing that raw speed is what we are after)? – venrey May 12 '20 at 07:37
1

Here is another option which relies on mutate and summarise automatically unpacking named tibble results ref.

dd <- data.frame(x=c(3,4),n=c(10,11))

get_binCI <- function(x,n) {
  s1 <- binom.test(x,n)$conf.int
  names(s1) <- c("lwr", "upr")
  as_tibble(as.list(s1))
}

dd %>% 
  group_by(x,n) %>%
  summarise(get_binCI(x, n))

# A tibble: 2 × 4
# Groups:   x [2]
      x     n    lwr   upr
  <dbl> <dbl>  <dbl> <dbl>
1     3    10 0.0667 0.652
2     4    11 0.109  0.692

The as_tibble(as.list()) part can be moved inside summarise when using functions like quantile:

mtcars %>% 
  group_by(cyl) %>% 
  summarise(as_tibble(as.list(quantile(mpg)))) 
Pete900
  • 2,016
  • 1
  • 21
  • 44