0

I have a dummy dataset to split into multiple files based on a single variable. Please see a dummy file.

df <- data.frame(group=c(2,2,1,2,2,3,1,1,3,1,3,1,3,2,3,2,3,3,3,3),
                 V1=c(2,2,7,5,5,5,2,7,2,2,4,4,4,3,4,5,3,3,6,6),
                 V2=c(4,3,4,4,4,4,4,4,5,2,5,5,5,4,3,2,3,4,2,4),
                 V3=c(4,4,1,3,4,2,4,5,5,5,5,4,5,4,3,3,4,4,4,4))

I need to separate the file for each value of group, which i could do like this

group_1 <- subset(df, group==1)
group_2 <- subset(df, group==2)
group_3 <- subset(df, group==3)

The new files' name will have the start with "group_" with the value.

In the real world, I'm working with a file with millions of rows of records and hundreds of groups, therefore I would like to automate the above using a loop. This code below doesn't work, but is an example of what i'm trying to achieve.

for (i in 1:3) {
  group_**i** <- subset(df, group==**i**)    
  }

Is this something that is possible to do in R? Or is there another function/package that can do this?

H.Cheung
  • 855
  • 5
  • 12
  • 1
    Read about `?split` – markus Jul 27 '20 at 14:49
  • 1
    The way to achieve this with your approach is using the [assign](https://stat.ethz.ch/R-manual/R-devel/library/base/html/assign.html) function. However it is not advised in R. Read [this](https://stackoverflow.com/questions/17559390/why-is-using-assign-bad) please. – maydin Jul 27 '20 at 14:57
  • @maydin. Thanks i adapted the assign function to the following code. **for (i in 1:3) { assign(paste("group", i, sep = "_"), data.frame(subset(df, group==i))) }** I read the article, I'm no expert but i get the idea, assign not good because there are other functions more suited to R way of doing things. Thanks – H.Cheung Jul 27 '20 at 15:21

1 Answers1

1

Try this (all dataframes will go to envir):

#Split
Liste <- split(df,df$group)
#Format names
names(Liste) <- paste0('group_',names(Liste))
#Set to envir
list2env(Liste,envir = .GlobalEnv)
Duck
  • 39,058
  • 13
  • 42
  • 84