0

This is quite possibly a duplicate of either or both of these, if so apologies and I guess that would make it an outstanding burning issue.

https://stackoverflow.com/questions/28388211/in-r-package-dplyr-how-to-use-own-defined-function-to-summarise-each

pass grouped dataframe to own function in dplyr

Using plyr one could run something like this:

ddply(mtcars, .(cyl), function(x) table(x$am))

and get the nice output

> ddply(mtcars, .(cyl), function(x) table(x$am))
  cyl  0 1
1   4  3 8
2   6  4 3
3   8 12 2

I still don't really get why ddply(mtcars, .(cyl), table(am)) doesn't work, but nevermind.

Is there a way to achieve the above in dplyr?

mtcars %>%
  group_by(cyl) %>%
  function(x) table(x$am)

Doesn't achieve the same results.

UPDATED QUESTION (leaving the above for historical purposes).

In hindsight, while the above is something I would like to do from time to time, I was more trying to get at functionality like this:

blah <- function(x) {
  x$position <- 1:nrow(x)
  x$count <- nrow(x)
  return(x)
}

ddply(mtcars, .(cyl,am), function(x) blah(x))
Community
  • 1
  • 1
nzcoops
  • 9,132
  • 8
  • 41
  • 52
  • I think `do()` is the general answer for you question, but it's a pain with this example because `table` doesn't return a data frame, nor is it all that easy to convert to the data frame you want. – Gregor Thomas Feb 11 '15 at 06:14
  • This is just a generic, reproducible example for coding purposes. I'd like to push it beyond table. – nzcoops Feb 11 '15 at 06:15
  • Have updated the question, which makes it different given the nuances of table, but still getting to the same broad point. – nzcoops Feb 11 '15 at 06:19
  • 1
    As noted by Gregor, you could use `do` in this case. With the `blah` function it would be `mtcars %>% group_by(cyl, am) %>% do(blah(.))` – talat Feb 11 '15 at 06:38

1 Answers1

3

Turning my and docendo's comments into an answer, this is what do() is for.

mtcars %>% group_by(cyl, am) %>% do(blah(.))
# same results as
plyr::ddply(mtcars, plyr::.(cyl, am), function(x) blah(x))
# same as plyr with no anonymous function in this case
plyr::ddply(mtcars, plyr::.(cyl, am), blah)

Because blah is taking in your full data frame (at least in terms of columns) and returning a data frame, you don't need the anonymous function call.

A lot is similar between dplyr and ddply, if you want to add columns, you use mutate, if you want to collapse grouping variables with aggregate functions, you use summarise. do is the dplyr equivalent of doing something else to each piece of data, but it needs to return a data frame.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thanks Gregor. Did you know of a way to get this working with table as in the first part of the question? do obviously won't work given table doesn't return a data frame. Unless there's a better function for getting 'table' like (cross tab) output? I generally use ctab from catspec but would have the same issue with that. – nzcoops Feb 11 '15 at 13:20
  • do() obviously just missed the cut for the intro vignette :) http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html – nzcoops Feb 11 '15 at 13:21
  • @nzcoops It's not the cleanest write-up for dplyr integration but [here's an example](http://stackoverflow.com/a/28438983/903061) using `reshape2::dcast` with `fun.aggregate = length()`. (Coincidentally, it's the question I answered just before this one.) – Gregor Thomas Feb 11 '15 at 18:42