5

The do function in the package dplyr usually produces the list. Is there are way to assign names to that list depending on the input to do? Specifically I pass the group_by result and would like that the names of the list would give some indication to what group the list elements correspond.

Here is the toy example of what I want to achieve:

> it = data.frame(ind=c("a","a","b","b","c"),var1=c(1,2,3,4,5), var1=c(2,3,4,2,2))
> group_by(it,ind)%.%summarise(min(var1))
Source: local data frame [3 x 2]

  ind min(var1)
1   c         5
2   b         3
3   a         1

Now do this with do

> do(group_by(it,ind),function(x)min(x[,"var1"]))
[[1]]
[1] 5

[[2]]
[1] 3

[[3]]
[1] 1

Ideally the names should be c("c","b","a").

Is this possible? And why dplyr reverses sorting of the groups? Note in my case the result of the do operation is a lm object.

Edit: The comment asks for realistic example, here is what I had in mind. I fit models depending on the data (dummy code):

res <- do(group_by(data,Index),lm,formula=y~x)

Now I want to do various things like

sapply(res,coef)

So I want to relate the results to the original dataset, in this case to what Index the coefficients correspond.

Edit 2: The desired behaviour can be achieved with dlply function:

dlply(it,~ind,function(d)min(d[,"var1"]))

$a
[1] 1

$b
[1] 3

$c
[1] 5

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  ind
1   a
2   b
3   c

I am looking whether it is possible to replicate this behaviour with dplyr, preferably with minimal intervention.

mpiktas
  • 11,258
  • 7
  • 44
  • 57

2 Answers2

4

Try this marked up version of do.grouped_df:

do2 <- function (.data, .f, ...) {
    if (is.null(attr(.data, "indices"))) {
        .data <- dplyr:::grouped_df_impl(.data, attr(.data, "vars"), 
            attr(.data, "drop"))
    }
    index <- attr(.data, "indices")
    out <- vector("list", length(index))
    for (i in seq_along(index)) {
        subs <- .data[index[[i]] + 1L, , drop = FALSE]
        out[[i]] <- .f(subs, ...)
    }
    nms <- as.character(attr(.data, "labels")[[1]])
    setNames(out, nms)
}

library(gusbfn)

it %.% group_by(ind) %.% do2(function(x) min(x$var1))

which gives:

$a
[1] 1

$b
[1] 3

$c
[1] 5

It could also be combined with fn$ from the gsubfn package like this to shorten it slightly:

library(dplyr)
library(gsubfn)

it %.% group_by(ind) %.% fn$do2(~ min(x$var1))

giving the same answer.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks, apparently next version of **dplyr** will have more sophisticated version of `do`, which probably will include such behaviour: https://github.com/hadley/dplyr/pull/283 – mpiktas Feb 25 '14 at 13:09
1

You can create a data.frame within your function:

 mods <- do(group_by(it,ind),function(x)
        data.frame(it=unique(as.character(x$ind)),val=min(x$var1)))

Then :

do.call(rbind,mods)
  it val
1  a   1
2  b   3
3  c   5

EDIT

 mods <- do(group_by(it,ind),
      function(x) setNames(list(min(x$var1)),unique(as.character(x$ind))))

unlist(mods,rec=FALSE)
$a
[1] 1

$b
[1] 3

$c
[1] 5
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • Thanks, but I want to get the list, since element of list theoretically can be any R object, which in general cannot be easily put to data.frame. – mpiktas Feb 24 '14 at 15:17
  • so change `data.frame` to `list`... Plus your example has the data originating from a data frame – rawr Feb 24 '14 at 15:29
  • @mpiktas see my edit. Of course a list is theoretically can be any R object, but in practice it is hard to create a list of different elements lengths using a group by action. – agstudy Feb 24 '14 at 15:30