How to Split a df into unique groups?

Question

I have the following data.frame. See an example of the first few lines and attributes.

SubPop      Origin   grid_code
 AL           2008   4.730380
 AL           2008   5.552315
 AL           2008   5.968850
 AL           2008   5.128384
 AL           2009   6.927450
 AL           2009   7.135734
 ALCentral    2008   7.381087
 ALCentral    2008   6.232927
 ALCentral    2009   6.431800
 ALCentral    2009   6.690246
 ALCentral    2009   6.794144

I'd like to know how to split this data.frame into unique groups of the combinations of the attributes SubPop and Origin. For example, the whole data.frame has a unique set of 48 combinations of SubPop and Origin.

That said, I'd like to have as my final output, 48 lists, and each list would only have the attributes of that group. Example: The first group "AL and 2008" would have all the entries of my dataframe that have the SubPop=Al and Origin=2008. And so on...

> unique<-unique(df[,c("SubPop", "Origin")])
> unique<-unique[order(unique$SubPop, unique$OriginT),]
> df_split<-split(df, unique)

With this code, I can find the unique combination of attributes, but the splitting process has randomly assigned attributes to groups.

Sorry if it's confusing...

The easiest way is probably `myList <- split(df, interaction(df$SubPop, df$Origin))` which will return a named list of data.frames split by the interaction of these two variables. — lmo, Jun 22 '17 at 17:34
I'm not sure if it's a good idea to split a homogeneous data structure in small chunks of data and treat each individually. Assume, you were a database admin, would you create 48 individually named tables just because two attributes were changing? — Uwe, Jun 23 '17 at 07:09

score 2 · Answer 1 · answered Jun 22 '17 at 17:35

There are many ways of doing this. Here are two:

xy <- read.table(text = "SubPop      Origin   grid_code
 AL           2008   4.730380
                 AL           2008   5.552315
                 AL           2008   5.968850
                 AL           2008   5.128384
                 AL           2009   6.927450
                 AL           2009   7.135734
                 ALCentral    2008   7.381087
                 ALCentral    2008   6.232927
                 ALCentral    2009   6.431800
                 ALCentral    2009   6.690246
                 ALCentral    2009   6.794144", header = TRUE)

by(data = xy, INDICES = list(xy$SubPop, xy$Origin), FUN = function(x) x)

library(dplyr)

xy %>%
  group_by(SubPop, Origin)

Shahab Einabadi · Answer 2 · 2017-06-22T18:35:39.473

-3

mylist <- split(df, interaction(df$SubPop,df$Origin))
indicator <- seq_len(length(mylist))
eval(parse(text = paste("L" , indicator , "<- ", "mylist[[", indicator, "]]", sep= "" )))

> L1
  SubPop Origin grid_code
1     AL   2008  4.730380
2     AL   2008  5.552315
3     AL   2008  5.968850
4     AL   2008  5.128384
> L2
     SubPop Origin grid_code
7 ALCentral   2008  7.381087
8 ALCentral   2008  6.232927
> L3
  SubPop Origin grid_code
5     AL   2009  6.927450
6     AL   2009  7.135734
> L4
      SubPop Origin grid_code
9  ALCentral   2009  6.431800
10 ALCentral   2009  6.690246
11 ALCentral   2009  6.794144

edited Jun 22 '17 at 18:35

answered Jun 22 '17 at 17:51

Shahab Einabadi

307
4
15

This answer has already been posted in the comments. What value has your post added? – lmo Jun 22 '17 at 18:10
sorry I did not see the posted comment – Shahab Einabadi Jun 22 '17 at 18:11
Now I added some value :) You can have a list for each part – Shahab Einabadi Jun 22 '17 at 18:36
A couple points with the additional code, `seq_len(length(mylist))` can be simplified to `seq_along(mylist)`. the eval / parse paradigm is best avoided if possible and in this case, you should be able to replace it with `list2env` like `list2env(mylist, envir=globalenv())`. – lmo Jun 22 '17 at 18:49
I hadn’t seen list2env function. This is awesome! – Shahab Einabadi Jun 22 '17 at 18:56
1

For what it's worth, I usually prefer to work with lists of data.frames in these cases as I can use `lapply` or other functions to process them in an organized fashion. gregor's answer to [this post](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) provides a number of points in favor of working with lists of data.frames. – lmo Jun 22 '17 at 19:02
1

Thank you very much. The post is really helpful. – Shahab Einabadi Jun 22 '17 at 19:15

How to Split a df into unique groups?

2 Answers2