7

A slightly changed example from the R help for do():

by_cyl <- group_by(mtcars, cyl)
models <- by_cyl %>% do(mod = lm(mpg ~ disp, data = .))
coefficients<-models %>% do(data.frame(coef = coef(.$mod)[[1]]))

In the dataframe coefficients, there is the first coefficient of the linear model for each cyl group. My question is how can I produce a dataframe that contains not only a column with the coefficients, but also a column with the grouping variable.

===== Edit: I extend the example to try to make more clear my problem

Let's suppose that I want to extract the coefficients of the model and some prediction. I can do this:

by_cyl <- group_by(mtcars, cyl)
getpars <- function(df){
  fit <- lm(mpg ~ disp, data = df)
  data.frame(intercept=coef(fit)[1],slope=coef(fit)[2])
}
getprediction <- function(df){
  fit <- lm(mpg ~ disp, data = df)
  x <- df$disp
  y <- predict(fit, data.frame(disp= x), type = "response")
  data.frame(x,y)
}
pars <- by_cyl %>% do(getpars(.))
prediction <- by_cyl %>% do(getprediction(.))

The problem is that the code is redundant because I am fitting the model two times. My idea was to build a function that returns a list with all the information:

getAll <- function(df){
  results<-list()
  fit <- lm(mpg ~ disp, data = df)
  x <- df$disp
  y <- predict(fit, data.frame(disp= x), type = "response")

  results$pars <- data.frame(intercept=coef(fit)[1],slope=coef(fit)[2])
  results$prediction <- data.frame(x,y)

  results
 }

The problem is that I don't know how to use do() with the function getAll to obtain for example just a dataframe with the parameters (like the dataframe pars).

danilinares
  • 1,172
  • 1
  • 9
  • 28
  • 1
    Not sure if this helps. You can use `summarise` instead of the second `do`. summarise(models, coef = coef(summary(mod))[[1]],group=cyl) – akrun Jul 05 '14 at 17:16
  • 3
    It's a bug, and I'll fix it as soon as I figure out how. – hadley Jul 05 '14 at 22:11
  • 1
    @hadley Has this been fixed? Could you please point to the github issue? – Rosen Matev Oct 22 '14 at 09:58
  • @RosenMatev Did you find anything about the issue? – danilinares Nov 24 '14 at 12:03
  • According to Hadley, it might be solved in dplyr 0.4 – danilinares Nov 24 '14 at 16:36
  • As far as I can tell, the issue with akrun's solution is that it only returns numeric values. I have a data set where I would like to report the grouping variable, but it's converting the factor levels to a number. I prefer Robert Krzyanowski's solution – spindoctor Jun 05 '15 at 09:07

2 Answers2

7

Like this?

coefficients <-models %>% do(data.frame(coef = coef(.$mod)[[1]], group = .[[1]]))

yielding

        coef group
  1 40.87196     4
  2 19.08199     6
  3 22.03280     8
Robert Krzyzanowski
  • 9,294
  • 28
  • 24
  • 1
    Thanks, something like that. I wonder whether it is possible to have something that automatically uses the grouping in group_by. So, if for example, group_by changes to group_by(mtcar,cyl,am), then it is not necessary to use group=.[[1]] and group2=.[[2]] inside the do(). – danilinares Jul 05 '14 at 16:02
  • 1
    I think it's even simpler; try `coefficients <- models %>% do(data.frame(coef=coef(.$mod), group = .[[1]], var = names(coef(.$mod))))` – gregmacfarlane Jul 18 '14 at 12:11
  • 1
    I know this is old at this point, but this really helped me. `do(data.frame(group = .[[1]], a=coef(.$mod)[1], b=coef(.$mod)[2], r2 = summary(.$mod)$r.squared))` This gets the entire equation for plotting out with the group_by variable. – bhive01 Aug 26 '15 at 17:34
4

Using the approach of Hadley Wickham in this video:

library(dplyr)
library(purrr)
library(broom)

fitmodel <- function(d) lm(mpg ~ disp, data = d)
by_cyl <- mtcars %>% 
  group_by(cyl) %>% 
  nest() %>%
  mutate(mod = map(data, fitmodel), 
         pars = map(mod, tidy), 
         pred = map(mod, augment))

pars <- by_cyl %>% unnest(pars)
prediction <- by_cyl %>% unnest(pred)
danilinares
  • 1,172
  • 1
  • 9
  • 28