1

I have a fairly large data frame and I'm trying to divide this data frame into multiple smaller ones. Suppose I have this data frame called df:

   Patient   Status    cancer
1        1  treated  melanoma
2        2 deceased  melanoma
3        3 deceased carcinoma
4        4  treated  lymphoma
5        5 deceased  melanoma
6        6  treated carcinoma
7        7 deceased  lymphoma
8        8 deceased carcinoma
9        9  treated  melanoma
10      10  treated  melanoma

I want to subset data frames based on the "cancer" column, and store them in their respective object, as follow:

  Patient   Status    cancer
1       3 deceased carcinoma
2       6  treated carcinoma
3       8 deceased carcinoma

  Patient   Status   cancer
1       1  treated melanoma
2       2 deceased melanoma
3       5 deceased melanoma
4       9  treated melanoma
5      10  treated melanoma

  Patient   Status   cancer
1       4  treated lymphoma
2       7 deceased lymphoma

I've mannaged to write this code, using dplyr's function filter, and it does the job, but because my initial data frame is pretty large, looping chokes my computer,

factors = c(levels(df[,"cancer"]))
for (i in factors) {
  assign(i, filter(df, cancer == i), envir = .GlobalEnv)
  }

I would appreciate if someone could kindly suggest a more optimized alternative.

Best regards.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Moosa
  • 11
  • 2
  • 1
    Just use `split`: `split(df,df$cancer)`. – nicola May 19 '19 at 05:40
  • why did you change your answer to a comment? It was a proper and correct answer. – Bruce Schardt May 19 '19 at 05:50
  • Thank you @nicola, and if i want to save every elelemnt in the retuned list by `split()`, do i have any better options than iterating? i mean: `xx = split(df, df$cancer)` `for (i in names(xx)) { assign(i, as.data.frame(xx[i]), envir = .GlobalEnv) }` – Moosa May 19 '19 at 06:12
  • Try `list2env`. See here: https://stackoverflow.com/questions/30516325/converting-a-list-of-data-frames-into-individual-data-frames-in-r – yarnabrina May 19 '19 at 06:28
  • amazing. thank you dear @yarnabrina for your great help. – Moosa May 19 '19 at 07:53
  • @BruceSchardt Because it's certainly a dupe, so I thought it was worth just a comment IMO. – nicola May 20 '19 at 10:28

1 Answers1

1

If you have data frames for which operations are slow in general consider changing to the data.table framework. You would be surprised of the increase in performance.

zappee
  • 20,148
  • 14
  • 73
  • 129
Carsten
  • 11
  • 2