-1

I have a rather large dataset test_data with more than 30'000 observations and 20 variables.
I would like to make smaller subsets based on the number of the set, which is determined under test_data$set. The size of the subset will vary (as shown below).

For a small dataset, I would subset the rows as follows:

test_data <- data.frame(measurement=c(2,34,5,6,7,38,3,4,29,11,12,4,5,6,91,13,13,13,12))
test_data <- mutate(test_data,set=c(1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4))

set1 <- subset(test_data, set == 1)
set2 <- subset(test_data, set == 2)
set3 <- subset(test_data, set == 3)
set4 <- subset(test_data, set == 4)

But since my data set is huge, I am looking for a way to make subsets without typing each subset command. Is anyone experienced with that?

Jael
  • 369
  • 1
  • 4
  • 13

1 Answers1

2

The easiest would be split to split into a list of data.frames

lst <- split(test_data, test_data$set)

and then do the processing within the list. It is recommended not to create lots of objects in the global environment. If we need to do any group by operations, then group_by from dplyr or by from data.table would be fast

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you for helping. So if I understand correctly, there is no way to do it in a one-line code (like the one you showed) with avoiding to get a list? – Jael Feb 21 '18 at 09:46
  • 1
    @Jael It is better to keep it in a `list` rather than creating individual datasets. But from the above `lst`, you can still get dataset objects ie. `list2env(setNames(lst, paste0("set", names(lst))), envir = .GlobalEnv)` but I wouldn't recommend to have that way – akrun Feb 21 '18 at 09:48
  • 1
    Okay, I see, thank you very much :-) – Jael Feb 21 '18 at 09:49