3

Given a dataset with multiple unique elements in a column, I'd like to split those unique elements into new dataframes, but have the dataframe nested one level down. Essentially adding an extra level to the split() command.

For instance (using the built-in iris table as an example:

iris
mylist <- split(iris, iris$Species)

produces a list, mylist, that contains 3 sublists, setosa, versicolor, virginica.

mylist[["setosa"]]

       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa

But I would actually like to nest that data table in a sublist called results BUT keep the upper level list name as setosa. Such that:

mylist$setosa["results"]

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa

I could do this with manual manipulation, but I'd like this to run automatically. I've tried unsuccessfully with mapply

mapply(function(names, df) 
   names <- split(df, df[["Species"]]), 
   unique(iris$Species), iris)

Any advice? Also happy to use a tidyr package if that makes things easier...

moxed
  • 343
  • 1
  • 6
  • 16
  • `dplyr::group_by(iris, Species) %>% tidyr::nest() %>% %>% { set_names(.$data, .$Species) }` ; `split(iris, iris$Species) %>% as.list()` – hrbrmstr Oct 11 '18 at 18:46

2 Answers2

7

Consider by (object-oriented wrapper to tapply), very similar to split but allows you to run a function on each subset. Often many useRs run split + lapply, unaware both can replaced with by:

mylist <- by(iris, iris$Species, function(sub) list(results=sub), simplify = FALSE)

head(mylist$setosa$results)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

head(mylist$versicolor$results)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
# 51          7.0         3.2          4.7         1.4 versicolor
# 52          6.4         3.2          4.5         1.5 versicolor
# 53          6.9         3.1          4.9         1.5 versicolor
# 54          5.5         2.3          4.0         1.3 versicolor
# 55          6.5         2.8          4.6         1.5 versicolor
# 56          5.7         2.8          4.5         1.3 versicolor

head(mylist$virginica$results)
#     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 101          6.3         3.3          6.0         2.5 virginica
# 102          5.8         2.7          5.1         1.9 virginica
# 103          7.1         3.0          5.9         2.1 virginica
# 104          6.3         2.9          5.6         1.8 virginica
# 105          6.5         3.0          5.8         2.2 virginica
# 106          7.6         3.0          6.6         2.1 virginica
Parfait
  • 104,375
  • 17
  • 94
  • 125
3

setNames in lapply will keep the names of the list you're iterating through

iris
mylist <- split(iris, iris$Species)
mylist2 <- lapply(setNames(names(mylist), names(mylist)), function(x){
  list(results = mylist[[x]])
})
Beemyfriend
  • 261
  • 1
  • 6