purrr split %>% map %>% bind VERSUS dplyr group_by %>% do

Question

I am often in the position of wanting to split-apply-combine regression models. I've found two ways of doing it, the "purrr" approach and the "dplyr::do()" approach.

Issue with the purrr approach: I want columns in the resulting data.frame to indicate the levels of the variables according to which the split was done, as in a normal group_by %>% summarize operation.

Issue with the dplyr::do() approach: there's a nasty tangle of do(tidy(lm_robust)) that is decidedly inelegant. But I get the columns back.

Main Q: is there a way to do split-apply-combine in purrr that returns the variable splits nicely?

The minimum working example below shows that the problem interacts with how many variables you're splitting by.

library(tidyverse)
library(estimatr) # for lm_robust

# spliting by one variable

# the purrr approach
mtcars %>%
  split(.$am) %>%
  map(~lm_robust(mpg ~ hp, data = .)) %>%
  map_df(tidy, .id = "am") # annoying to have to type "am" again!

# the dplyr do() approach
mtcars %>%
  group_by(am) %>%
  do(tidy(lm_robust(mpg ~ hp, data = .))) # gross nesting

# Splitting by two variables

# the purr approach??
mtcars %>%
  split(list(.$am, .$vs)) %>%
  map(~lm_robust(mpg ~ hp, data = .))  %>%
  map_df(tidy, .id = "OH NO") # the column encodes both am and vs info

# the dplyr do() approach works great
mtcars %>%
  group_by(am, vs) %>%
  do(tidy(lm_robust(mpg ~ hp, data = .))) # still nested up.

EDIT

here's a way that uses nest() and unnest(). clunky, but maybe the best purrr approach? inspired by http://stat545.com/block024_group-nest-split-map.html

mtcars %>%
  group_by(am, vs) %>%
  nest() %>%
  mutate(fit = map(data, ~lm_robust(mpg ~ hp, data = .)),
         tidy = map(fit, tidy)) %>%
  select(am, vs, tidy) %>%
  unnest(tidy)

EDIT 2

Here's a way with group_map that's just as ugly as do, but maybe that's just the way it goes.

mtcars %>%
  group_by(am, vs) %>%
  group_map(~tidy(lm_robust(mpg ~ hp, data = .x)))

EDIT 3:

I guess what would seem beautiful to me would do one thing per line, something like this, but I respect the comments below saying, geez, group map is pretty close.

# does not work
mtcars %>%
  group_by(am, vs) %>%
  map(~lm_robust(mpg ~ hp, data = .)) %>%
  map_df(tidy)

It would be helpful if you could show how a code should look like for it mot be termed as `ugly`. The `EDIT 2` is 3 steps, one `group_by` and a `group_map` — akrun, May 05 '19 at 21:14
There are redundant steps in the approach in the first edit. You can rewrite it more succinctly as: `mtcars %>% nest(-am, -vs) %>% mutate(tidy = map(data, ~lm_robust(mpg ~ hp, data = .) %>% tidy)) %>% unnest(tidy)`. — Ritchie Sacramento, May 05 '19 at 23:57
what's ugly about the `group_map` ? I can't imagine a much more compact approach, and it's quite readable to me. if it's really about the brace nesting you can try `group_map(~lm_robust(mpg ~ hp,.) %>% tidy)` on your last line — moodymudskipper, May 06 '19 at 13:13
maybe I'm missing something but none of the examples you gave where you complained about nesting are nested for me — Mark, Aug 16 '23 at 09:09
but anyway, you were looking for a pretty answer, so I gave one below! :-) — Mark, Aug 22 '23 at 07:04

score 0 · Answer 1 · answered Aug 16 '23 at 09:15

0

A one-liner:

mtcars %>% reframe(tidy(lm_robust(mpg ~ hp, .)), .by = c(am, vs))

answered Aug 16 '23 at 09:15

Mark

7,785
2
14
34

purrr split %>% map %>% bind VERSUS dplyr group_by %>% do

1 Answers1