Assigning names to the list output of dplyr do operation

Question

The do function in the package dplyr usually produces the list. Is there are way to assign names to that list depending on the input to do? Specifically I pass the group_by result and would like that the names of the list would give some indication to what group the list elements correspond.

Here is the toy example of what I want to achieve:

> it = data.frame(ind=c("a","a","b","b","c"),var1=c(1,2,3,4,5), var1=c(2,3,4,2,2))
> group_by(it,ind)%.%summarise(min(var1))
Source: local data frame [3 x 2]

  ind min(var1)
1   c         5
2   b         3
3   a         1

Now do this with do

> do(group_by(it,ind),function(x)min(x[,"var1"]))
[[1]]
[1] 5

[[2]]
[1] 3

[[3]]
[1] 1

Ideally the names should be c("c","b","a").

Is this possible? And why dplyr reverses sorting of the groups? Note in my case the result of the do operation is a lm object.

Edit: The comment asks for realistic example, here is what I had in mind. I fit models depending on the data (dummy code):

res <- do(group_by(data,Index),lm,formula=y~x)

Now I want to do various things like

sapply(res,coef)

So I want to relate the results to the original dataset, in this case to what Index the coefficients correspond.

Edit 2: The desired behaviour can be achieved with dlply function:

dlply(it,~ind,function(d)min(d[,"var1"]))

$a
[1] 1

$b
[1] 3

$c
[1] 5

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  ind
1   a
2   b
3   c

I am looking whether it is possible to replicate this behaviour with dplyr, preferably with minimal intervention.

You could also use `as.list(by(it, it$ind, function(x) min(x[,'var1'])))` to get what you want, no need for `dplyr`. — Matthew Plourde, Feb 24 '14 at 15:11
Oh I know lots of ways how to do that, but I'm asking specifically about dplyr. — mpiktas, Feb 24 '14 at 15:15
@mpiktas why not post a more realistic example of a problem you want to solve? — eddi, Feb 24 '14 at 20:19

G. Grothendieck · Accepted Answer · 2014-02-24T19:08:32.563

Try this marked up version of do.grouped_df:

do2 <- function (.data, .f, ...) {
    if (is.null(attr(.data, "indices"))) {
        .data <- dplyr:::grouped_df_impl(.data, attr(.data, "vars"), 
            attr(.data, "drop"))
    }
    index <- attr(.data, "indices")
    out <- vector("list", length(index))
    for (i in seq_along(index)) {
        subs <- .data[index[[i]] + 1L, , drop = FALSE]
        out[[i]] <- .f(subs, ...)
    }
    nms <- as.character(attr(.data, "labels")[[1]])
    setNames(out, nms)
}

library(gusbfn)

it %.% group_by(ind) %.% do2(function(x) min(x$var1))

which gives:

$a
[1] 1

$b
[1] 3

$c
[1] 5

It could also be combined with fn$ from the gsubfn package like this to shorten it slightly:

library(dplyr)
library(gsubfn)

it %.% group_by(ind) %.% fn$do2(~ min(x$var1))

giving the same answer.

Thanks, apparently next version of **dplyr** will have more sophisticated version of `do`, which probably will include such behaviour: https://github.com/hadley/dplyr/pull/283 — mpiktas, Feb 25 '14 at 13:09

agstudy · Answer 2 · 2014-02-24T15:30:17.077

1

You can create a data.frame within your function:

 mods <- do(group_by(it,ind),function(x)
        data.frame(it=unique(as.character(x$ind)),val=min(x$var1)))

Then :

do.call(rbind,mods)
  it val
1  a   1
2  b   3
3  c   5

EDIT

 mods <- do(group_by(it,ind),
      function(x) setNames(list(min(x$var1)),unique(as.character(x$ind))))

unlist(mods,rec=FALSE)
$a
[1] 1

$b
[1] 3

$c
[1] 5

edited Feb 24 '14 at 15:30

answered Feb 24 '14 at 15:11

agstudy

119,832
17
199
261

Thanks, but I want to get the list, since element of list theoretically can be any R object, which in general cannot be easily put to data.frame. – mpiktas Feb 24 '14 at 15:17
so change `data.frame` to `list`... Plus your example has the data originating from a data frame – rawr Feb 24 '14 at 15:29
@mpiktas see my edit. Of course a list is theoretically can be any R object, but in practice it is hard to create a list of different elements lengths using a group by action. – agstudy Feb 24 '14 at 15:30

Assigning names to the list output of dplyr do operation

2 Answers2

EDIT

Linked