0

I have a data frame where one column will repeat the same string for a number of lines (it varies). I'd like to split the data frame based on each of the repeating names into separate data frames (the output can be a list). For example for this data frame:

dat = data.frame(names=c('dog','dog','dog','dog','cat','cat'), value=c(1,2,3,4,5,5)) 

The output should be

   names value
   dog     1
   dog     2
   dog     3
   dog     4

and

   names value
   cat     5
   cat     5

I should mention there are thousands of different repeating names.

Frank
  • 66,179
  • 8
  • 96
  • 180
user3067923
  • 437
  • 2
  • 14
  • If you're willing to install packages (`dplyr` or `data.table`), there are better ways of dealing with grouping variables than actually holding onto distinct data.frames. For example, in `data.table` you can use `dat[.("dog")]` to get that subset whenever you need it, and `dat[,do_stuff,by=names]` whenever you need to do the same operation on each group. (Not the downvoter.) – Frank May 22 '15 at 16:24
  • What is supposed to happen with `names=c('dog','dog','dog','dog','cat','cat', 'dog','dog')`? – IRTFM May 22 '15 at 16:28

1 Answers1

1

You can use the split function, which will give the output in a list. I think it would be easier to have the datasets in the list as most of the operations can be performed within the list itself

 split(dat, dat$names)

If in case you want to split the 'dog', 'cat', 'dog' as a 'list' with 3 elements (based on the example showed by @BondedDust), one option is

 indx <- inverse.rle(within.list(rle(as.character(dat$names)), 
                values <- seq_along(values)))
 split(dat, indx)

Or using the devel version of data.table, we can use rleid to create a grouping variable

 library(data.table)#v1.9.5+
 setDT(dat)[, grp:= rleid(names)]

and then use the standard data.table operations for the different groups by specifying the 'grp' as the grouping variable.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Just in case if you still want separate data frames, you can do this `df = split(dat, dat$names)` and then `attach(df)` to get all dataframes separately – Veerendra Gadekar May 22 '15 at 16:10
  • 1
    @VeerendraGadekar I would use `list2env(df, envir=.GlobalEnv)` after naming the list elements – akrun May 22 '15 at 16:11
  • I think after `attach()` the dataframes in the list are already in environment. isn't it? – Veerendra Gadekar May 22 '15 at 16:15
  • 1
    @VeerendraGadekar It's just my preference to use `list2env` to avoid `atfach` because if we attach a single dataset, the column objects will be also in the global environment (though it is not the case here for list) – akrun May 22 '15 at 16:17