1

I want to turn a dataframe into a list of lists, as in this example:

 df <- data.frame(var1=c("A","A","A","B","B","C"),var2=seq(1,6))

 >   var1 var2
 1    A    1
 2    A    2
 3    A    3
 4    B    4
 5    B    5
 6    C    6

 split(df$var2,df$var1)

 $A
 [1] 1 2 3

 $B
 [1] 4 5

 $C
 [1] 6

However, I wonder if it is possible to do it using a %>% pipe. I don't know how to use split in a pipe, where df is given as an input. Or maybe there's another way.

Massimo2013
  • 533
  • 4
  • 17
  • 1
    presumably you could use a construct like `df %>% \(x) {split(x$var2, x$var1)}`? – Paul Stafford Allen Mar 04 '23 at 09:40
  • I don't know the syntax with `\(x)` but it doesn't seem to work. As to piping directly into split with `df %>% split(.$var1)`, it produces a different result, it produces a list of dataframes – Massimo2013 Mar 04 '23 at 09:46
  • An other option: `df %>% dplyr::group_split(var1)` – Chamkrai Mar 04 '23 at 09:52
  • Thank you, but this does not produce a list of lists as in the example, it produces a list of dataframes – Massimo2013 Mar 04 '23 at 09:56
  • A question to @jay.sf who modified my question, replacing `dplyr` with `magrittr`. Is `%>%` a `magrittr` pipe? I ask because I don't need to load `magrittr` to use it, while `magrittr` introduces new pipes that allow further possibilities. I think the title should not include `magrittr` – Massimo2013 Mar 04 '23 at 10:21
  • 1
    @Massimo2013 not jay.sf, but yes, `%>%` is from `magrittr` and it's the only "tidyverse" function used in your code, so it's the more appropriate package to tag compared to `dplyr`. Note, all tidyverse packages will automatically load it without needing you to explicity load magrittr. – Paul Stafford Allen Mar 04 '23 at 10:31
  • I see, but `magrittr` pipes is a larger set than simply `%>%`. I further modified the title of the question because I specifically wanted to solve the problem with `%>%`. Apparently, if we load `magrittr` there are other solutions involving e.g. `%$%`. In any case, thank you for the clarification – Massimo2013 Mar 04 '23 at 10:38

4 Answers4

5

There are a couple of options, some of them already mentioned in the comments:

df <- data.frame(var1=c("A","A","A","B","B","C"),var2=seq(1,6))


# magrittr pipe
df %>% 
  {split(.$var2, .$var1)}

#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6
  
# base pipe
df |>
  (\(x) split(x$var2, x$var1))()
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6

# dplyr
library(dplyr)
library(purrr)

df %>% 
group_by(var1) %>% 
  group_map(~ pull(.x, var2)) %>% 
  set_names(unique(df$var1))
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6

Created on 2023-03-04 with reprex v2.0.2

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
4

Not a dplyr solution but using magrittrs "exposition" pipe %$% you could do:

df <- data.frame(var1=c("A","A","A","B","B","C"),var2=seq(1,6))

library(magrittr)

df %$%
  split(var2, var1)
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6
stefan
  • 90,330
  • 6
  • 25
  • 51
1

You may want to use the function written in this issue called named_group_split in combination with map and pull like this:

library(dplyr)
library(purrr)
df %>%
  named_group_split(var1) %>%
  map(~ pull(.x))
#> $A
#> [1] 1 2 3
#> 
#> $B
#> [1] 4 5
#> 
#> $C
#> [1] 6

Created on 2023-03-04 with reprex v2.0.2


Function named_group_split:

named_group_split <- function(.tbl, ...) {
  grouped <- group_by(.tbl, ...)
  names <- rlang::inject(paste(!!!group_keys(grouped), sep = " / "))
  
  grouped %>% 
    group_split() %>% 
    rlang::set_names(names)
}
Quinten
  • 35,235
  • 5
  • 20
  • 53
1

Here is another option combining with lapply

library(dplyr)

df %>% 
  split(.$var1) %>% 
  lapply(function(x) x$var2)
$A
[1] 1 2 3

$B
[1] 4 5

$C
[1] 6
TarJae
  • 72,363
  • 6
  • 19
  • 66