1

I have a large dataframe I would like to split into multiple small data frames, based on the value in the Name column.

head(DATAFILE)

# Age    Site    Name    1    2    3    4    5

# 10     1      Orange   0    2    1    0    1
# 10     1      Apple    2    5    4    0    2
# 10     1      Banana   0    0    0    0    2
# 20     2      Orange   0    2    1    0    0
# 20     2      Apple    0    2    0    7    1
# 20     2      Banana   0    4    1    3    6

And an example file of the desired output;

head(Orange)

# Age    Site    Name    1    2    3    4    5

# 10     1      Orange   0    2    1    0    1
# 20     2      Orange   0    2    1    0    0

I have tried

SPLIT.DATA <- split(DATAFILE, DATAFILE$Name, drop = FALSE)

But this returns a large list, and I would like individual files so that I can save them as .csv files. So I would like either a better way of dividing the original file, or a way to further divide the SPLIT.DATA file.

EcologyTom
  • 2,344
  • 2
  • 27
  • 38
  • 2
    It is better to keep it in a `list` and loop through the `SPLIT.DATA` to write to csv all at once instead of having several objects in the global envirnoment and then saving it individually. i..e `lapply(names(SPLIT.DATA), function(nm) write.csv(SPLIT.DATA[[nm]], paste0(nm, ".csv"), row.names = FALSE, quote = FALSE))` – akrun Aug 23 '16 at 17:09
  • We recommend to use `dput` to share data like this in R questions (see the R tag description) because that way it's easily reproducible by people who want to help you. – Hack-R Aug 23 '16 at 17:11
  • I already updated the comment. Please check it. – akrun Aug 23 '16 at 17:12
  • Thanks @akrun, that works nicely. Is there a way to suppress the .Rdata files which are produced at the same time? – EcologyTom Aug 23 '16 at 17:17
  • @EcologyTom Are you using `Rstudio`? – akrun Aug 23 '16 at 17:17
  • I am not working on Rstudio, but I think the `.Rdata` files may not have any real impact. When you close the R session without saving, these temporary files will not be saved. – akrun Aug 23 '16 at 17:20
  • Thanks for your help. I will just delete the .Rdata files. Do you want to write your comments up as an answer so that I can accept it? – EcologyTom Aug 23 '16 at 17:22

1 Answers1

1

It is better to save the datasets directly from the list output of split itself instead of creating individual objects in the global environment. We loop by the names of the 'SPLIT.DATA', and write the list elements to individual csv files with the same name as the names of the list elements by pasteing the names to .csv in the write.csv call.

lapply(names(SPLIT.DATA), function(nm) 
   write.csv(SPLIT.DATA[[nm]], paste0(nm, ".csv"), row.names = FALSE, quote = FALSE))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • To keep the individual data.frames from 'polluting' the global env. `list2env` is very useful. – hvollmeier Aug 23 '16 at 18:24
  • @hvollmeier Yes you can use `list2env`, but not recommended as most of the operations can be done within the `lsit` – akrun Aug 23 '16 at 18:27
  • @akrun Hey akrun, could be give me an idea on just when to use `lapply` or in which cases. I mean we use in almost many cases but just to make myself understand when to go for it, irrespective of what function we are using inside it. – Sowmya S. Manian Aug 23 '16 at 18:49
  • 1
    @SowmyaS.Manian You can use `lapply` to `list` or `vector` or `data.frame` (as data.frame is a `list` with equal length of list elements). The difference is that it always returns a `list`. But, `sapply` can return a `matrix` as it is by default `simplify = TRUE`. Then, we can also use `vapply` (for fast processing) and it has additional checks as well. The case with `rapply` is for recursive application of function and `mapply/Map` for applying functions to correspoinding elements of vector/data.frame columns/lists. – akrun Aug 23 '16 at 18:54
  • Wow `mapply/Map` would be cool!! I have read about `apply` family functions long ago mostly on what are their input types and output types and also about `tapply` and some other mix ups. But this brief helps a lot. Mostly I prefer with running more examples and checking how they work to understand each bit of it. Thanks a lot..:) – Sowmya S. Manian Aug 23 '16 at 19:05