For the application of a function to multiple smaller datasets from a larger dataset, I need to perform a split of the large dataset by multiple variables. However, for further use of the child datasets, I want to store them in a nested list with the different grouping variables as list node names (to be used with rapply
).
An example:
head_mtcars <- head(mtcars, 10)
I know from here that I can split the data set using list(data$V1, data$V2)
, but the generated list unfortunately only keeps the grouping variable in the same level. I would be wishing for list nodes like $6$3
, $8$3
etc.:
split(head_mtcars, list(head_mtcars$cyl, head_mtcars$gear), drop = T)
$`6.3`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
$`8.3`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
$`4.4`
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
$`6.4`
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
I also tried to change the separator but this does not help:
## only changes the naming separator to a $ but does not actually create a new list level:
split(head_mtcars, list(head_mtcars$cyl, head_mtcars$gear), drop = T, sep = "$")
$`6$3`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
$`8$3`
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
$`4$4`
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
$`6$4`
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
I also tried to modify the code from here to be used with multiple splitting variables, but this moves the group variables to dimnames
, from which I don't know how (if possible at all) to convert to nested list levels (it works perfectly when using only one grouping variable).
by(head_mtcars, list(head_mtcars$cyl, head_mtcars$gear), identity, simplify = FALSE)
: 4
: 3
NULL
-------------------------------------------------------------
: 6
: 3
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
-------------------------------------------------------------
: 8
: 3
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
-------------------------------------------------------------
: 4
: 4
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1 0 4 2
-------------------------------------------------------------
: 6
: 4
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
-------------------------------------------------------------
: 8
: 4
NULL
I also tried various tidyverse
approaches but also none of them really solved the problem.
In the end, I would like to have a nested list with the levels from $cyl
as the first level and the levels from $gear
as the level below. Any advice?