2

I am trying to move away from rowwise() for list columns as I have heard that the tidyverse team is in the process of axing it. However, I am not used to using the purrr functions so I feel like there must be a better way of doing the following:

I create a list-column containing a tibble for each species. I then want to go into the tibble and take the mean of certain variables. The first case is using map and second is the rowwise solution that I personally feel is cleaner.

Does anyone know a better way to use map in this situation?

library(tidyverse)
iris %>% 
  group_by(Species) %>% 
  nest() %>% 
  mutate(mean_slength = map_dbl(data, ~mean(.$Sepal.Length, na.rm = TRUE)),
         mean_swidth = map_dbl(data, ~mean(.$Sepal.Width, na.rm = TRUE))
         )
#> # A tibble: 3 x 4
#>   Species    data              mean_slength mean_swidth
#>   <fct>      <list>                   <dbl>       <dbl>
#> 1 setosa     <tibble [50 x 4]>         5.01        3.43
#> 2 versicolor <tibble [50 x 4]>         5.94        2.77
#> 3 virginica  <tibble [50 x 4]>         6.59        2.97

iris %>% 
  group_by(Species) %>% 
  nest() %>% 
  rowwise() %>% 
  mutate(mean_slength = mean(data$Sepal.Length, na.rm = TRUE),
         mean_swidth = mean(data$Sepal.Width, na.rm = TRUE))
#> Source: local data frame [3 x 4]
#> Groups: <by row>
#> 
#> # A tibble: 3 x 4
#>   Species    data              mean_slength mean_swidth
#>   <fct>      <list>                   <dbl>       <dbl>
#> 1 setosa     <tibble [50 x 4]>         5.01        3.43
#> 2 versicolor <tibble [50 x 4]>         5.94        2.77
#> 3 virginica  <tibble [50 x 4]>         6.59        2.97

Created on 2018-12-26 by the reprex package (v0.2.1)

Hank Lin
  • 5,959
  • 2
  • 10
  • 17
  • You don't actually need `rowwise` nor `purrr` functions. Try `iris %>% group_by(Species) %>% summarise(mean_slpength = mean(Sepal.Length), mean_swidth = mean(Sepal.Width))` gives the same output. – Ronak Shah Dec 27 '18 at 01:24
  • @RonakShah This is a watered-down example. My real problem involves going into tibbles contained in a list column and applying functions, so your solution doesn't work for me unfortunately :(. Thank you though! – Hank Lin Dec 27 '18 at 01:27
  • It's much of a muchness, really. In what way do you feel the `rowwise()` solution is cleaner? I could see how having to specify `map_dbl` for each column might get old. I think the preference for `map_*` comes partly down its explicitness: you don't need to track back up the pipe to find `rowwise()` in order to understand what's happening. (Though you could argue the same about `group_by()`.) – jimjamslam Dec 27 '18 at 03:00
  • It's also clear with `map()` functions _which_ columns you're iterating over the elements of, and you can mix columns to iterate over with columns to not iterate over. But if you're happy with `rowwise()` and it's not giving you any problems, I wouldn't beat myself up over it. It's hard to provide more explicit guidance without knowing what you actually want to do with it. – jimjamslam Dec 27 '18 at 03:05

1 Answers1

2

Instead of having two map, use a single one, with summarise_at

library(tidyverse)
iris %>% 
   group_by(Species) %>% 
   nest() %>% 
   mutate(out = map(data, ~ 
               .x  %>% 
                 summarise_at(vars(matches('Sepal')), 
              funs(mean_s = mean(., na.rm = TRUE))))) %>% 
   unnest(out)
akrun
  • 874,273
  • 37
  • 540
  • 662