Dataframe to list of lists using split in a %>% pipe

Question

I want to turn a dataframe into a list of lists, as in this example:

 df <- data.frame(var1=c("A","A","A","B","B","C"),var2=seq(1,6))

 >   var1 var2
 1    A    1
 2    A    2
 3    A    3
 4    B    4
 5    B    5
 6    C    6

 split(df$var2,df$var1)

 $A
 [1] 1 2 3

 $B
 [1] 4 5

 $C
 [1] 6

However, I wonder if it is possible to do it using a %>% pipe. I don't know how to use split in a pipe, where df is given as an input. Or maybe there's another way.

presumably you could use a construct like `df %>% \(x) {split(x$var2, x$var1)}`? — Paul Stafford Allen, Mar 04 '23 at 09:40
I don't know the syntax with `\(x)` but it doesn't seem to work. As to piping directly into split with `df %>% split(.$var1)`, it produces a different result, it produces a list of dataframes — Massimo2013, Mar 04 '23 at 09:46
Thank you, but this does not produce a list of lists as in the example, it produces a list of dataframes — Massimo2013, Mar 04 '23 at 09:56
A question to @jay.sf who modified my question, replacing `dplyr` with `magrittr`. Is `%>%` a `magrittr` pipe? I ask because I don't need to load `magrittr` to use it, while `magrittr` introduces new pipes that allow further possibilities. I think the title should not include `magrittr` — Massimo2013, Mar 04 '23 at 10:21
@Massimo2013 not jay.sf, but yes, `%>%` is from `magrittr` and it's the only "tidyverse" function used in your code, so it's the more appropriate package to tag compared to `dplyr`. Note, all tidyverse packages will automatically load it without needing you to explicity load magrittr. — Paul Stafford Allen, Mar 04 '23 at 10:31
I see, but `magrittr` pipes is a larger set than simply `%>%`. I further modified the title of the question because I specifically wanted to solve the problem with `%>%`. Apparently, if we load `magrittr` there are other solutions involving e.g. `%$%`. In any case, thank you for the clarification — Massimo2013, Mar 04 '23 at 10:38

score 5 · Accepted Answer · answered Mar 04 '23 at 09:57

5

There are a couple of options, some of them already mentioned in the comments:

df <- data.frame(var1=c("A","A","A","B","B","C"),var2=seq(1,6))


# magrittr pipe
df %>% 
  {split(.$var2, .$var1)}

#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6
  
# base pipe
df |>
  (\(x) split(x$var2, x$var1))()
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6

# dplyr
library(dplyr)
library(purrr)

df %>% 
group_by(var1) %>% 
  group_map(~ pull(.x, var2)) %>% 
  set_names(unique(df$var1))
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6

^{Created on 2023-03-04 with reprex v2.0.2}

answered Mar 04 '23 at 09:57

TimTeaFan

17,549
4
18
39

Thank you, the first looks the easiest. I didn't know the use of '{ }', where is it documented? – Massimo2013 Mar 04 '23 at 10:00
@Massimo2013 see the comments to this thread and the docs they link to: https://stackoverflow.com/questions/42623497/when-should-we-use-curly-brackets-when-piping-with-dplyr – Paul Stafford Allen Mar 04 '23 at 10:03
and this explicit answer: https://stackoverflow.com/a/42386886/16730940 – Paul Stafford Allen Mar 04 '23 at 10:04

score 4 · Answer 2 · answered Mar 04 '23 at 10:00

4

Not a dplyr solution but using magrittrs "exposition" pipe %$% you could do:

df <- data.frame(var1=c("A","A","A","B","B","C"),var2=seq(1,6))

library(magrittr)

df %$%
  split(var2, var1)
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6

answered Mar 04 '23 at 10:00

stefan

90,330
6
25
51

score 1 · Answer 3 · answered Mar 04 '23 at 09:55

You may want to use the function written in this issue called named_group_split in combination with map and pull like this:

library(dplyr)
library(purrr)
df %>%
  named_group_split(var1) %>%
  map(~ pull(.x))
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6

^{Created on 2023-03-04 with reprex v2.0.2}

Function named_group_split:

named_group_split <- function(.tbl, ...) {
  grouped <- group_by(.tbl, ...)
  names <- rlang::inject(paste(!!!group_keys(grouped), sep = " / "))
  
  grouped %>% 
    group_split() %>% 
    rlang::set_names(names)
}

score 1 · Answer 4 · answered Mar 04 '23 at 14:26

1

Here is another option combining with lapply

library(dplyr)

df %>% 
  split(.$var1) %>% 
  lapply(function(x) x$var2)

$A
[1] 1 2 3

$B
[1] 4 5

$C
[1] 6

answered Mar 04 '23 at 14:26

TarJae

72,363
6
19
66

Dataframe to list of lists using split in a %>% pipe

4 Answers4