2

I have a data frame with monthly temperature data for several locations:

    > df4[1:36,]
       location    variable cut month year freq
1    Adamantina temperature  10   Jan 1981 21.0
646  Adamantina temperature  10   Feb 1981 20.5
1291 Adamantina temperature  10   Mar 1981 21.5
1936 Adamantina temperature  10   Apr 1981 21.5
2581 Adamantina temperature  10   May 1981 24.0
3226 Adamantina temperature  10   Jun 1981 21.5
3871 Adamantina temperature  10   Jul 1981 22.5
4516 Adamantina temperature  10   Aug 1981 23.5
5161 Adamantina temperature  10   Sep 1981 19.5
5806 Adamantina temperature  10   Oct 1981 21.5
6451 Adamantina temperature  10   Nov 1981 23.0
7096 Adamantina temperature  10   Dec 1981 19.0
2        Adolfo temperature  10   Jan 1981 24.0
647      Adolfo temperature  10   Feb 1981 20.0
1292     Adolfo temperature  10   Mar 1981 24.0
1937     Adolfo temperature  10   Apr 1981 23.0
2582     Adolfo temperature  10   May 1981 18.0
3227     Adolfo temperature  10   Jun 1981 21.0
3872     Adolfo temperature  10   Jul 1981 22.0
4517     Adolfo temperature  10   Aug 1981 19.0
5162     Adolfo temperature  10   Sep 1981 19.0
5807     Adolfo temperature  10   Oct 1981 24.0
6452     Adolfo temperature  10   Nov 1981 24.0
7097     Adolfo temperature  10   Dec 1981 24.0
3         Aguai temperature  10   Jan 1981 24.0
648       Aguai temperature  10   Feb 1981 20.0
1293      Aguai temperature  10   Mar 1981 22.0
1938      Aguai temperature  10   Apr 1981 20.0
2583      Aguai temperature  10   May 1981 21.5
3228      Aguai temperature  10   Jun 1981 20.5
3873      Aguai temperature  10   Jul 1981 24.0
4518      Aguai temperature  10   Aug 1981 23.5
5163      Aguai temperature  10   Sep 1981 18.5
5808      Aguai temperature  10   Oct 1981 21.0
6453      Aguai temperature  10   Nov 1981 22.0
7098      Aguai temperature  10   Dec 1981 23.5

What I need to do is to programmatically split this data frame by location and create a .Rdata file for every location.

In the example above, I would have three different files - Adamantina.Rdata, Adolfo.Rdata and Aguai.Rdata - containing all the columns but only the rows corresponding to those locations.

It needs to be efficient and programmatic, because in my actual data I have about 700 different locations and about 50 years of data for every location.

Thanks in advance.

thiagoveloso
  • 2,537
  • 3
  • 28
  • 57
  • 2
    What have you tried? Where are you stuck? `for (loc in unique(df4$location)) save(df4[df4$location == loc], file = paste0(loc, ".Rdata"))` should work. For marginal speed gains (for this simple operation) you could use `dplyr::do` or `data.table` instead, but why bother?' – Gregor Thomas Oct 30 '15 at 00:45
  • 1
    Possible duplicate of [split dataframe into multiple output files in r](http://stackoverflow.com/questions/10002021/split-dataframe-into-multiple-output-files-in-r) –  Oct 30 '15 at 00:52
  • @Gregor I get an error message `Error in save(df4[df4$location == loc], file = paste0("/disk1/Project/Shiny/data/", : object ‘df4[df4$location == loc]’ not found` when I try your suggestion – thiagoveloso Oct 30 '15 at 04:14
  • 1
    I forgot a comma, `df4[df4$location == loc, ]`. – Gregor Thomas Oct 30 '15 at 14:56

2 Answers2

5

This is borrowing from a previous answer, but I don't believe that answer does you want.

First, as they suggest, you want to split up your data set.

splitData <- split(df4, df4$location)

Now, to go through this list and one by one, save your datasetset, this can be done with by pulling off the names:

 allNames <- names(splitData)
 for(thisName in allNames){
     saveName = paste0(thisName, '.Rdata')
     saveRDS(splitData[[thisName]], file = saveName)
}
Cliff AB
  • 1,160
  • 8
  • 15
  • Thanks for the suggestion! It works with `saveRDS`, but it returns an error when I try to use only `save` instead. It says `Error in save(splitData[[thisName]], file = saveName) : object ‘splitData[[thisName]]’ not found`. – thiagoveloso Oct 30 '15 at 04:17
  • 2
    @thiagoveloso: you *could* do this with `save`, but for a few reasons I personally prefer `saveRDS`, especially given your goal of having separate `.Rdata` files. If you really wanted, you could something crazy like `.GlobalEnv[[thisName]] <- splitData[[thisName]]; eval(substitute(save(NAME, FILENAME), list(NAME = as.name(thisName), FILENAME = saveName) ) )`. But this is really trying to shoehorn `save()` in and I definitely do not recommend it. There might be a better way to use `save` though. – Cliff AB Oct 30 '15 at 04:35
3

To split data frame, use split(df4, df4$location). It will create data frames named Adamantina, Adolfo, Aguai, etc.

And to save these new data frames into locations.RData file, use save(Adamantina, Adolfo, Aguai, file="locations.RData"). save.image(file="filename.RData") will save everything in current R session into filename.RData file.

You can read more about save and save.image here.

Edit:

If number of splits is way too large, then use this approach:

locations <- split(df4, df4$location)
save(locations, "locations.RData")

locations.RData will then load as a list.

narendra-choudhary
  • 4,582
  • 4
  • 38
  • 58
  • Godd suggestion, but the ideia is to save only the individual data frames, and iddealy not by manually specifying their name (there are almost 700 of them). – thiagoveloso Oct 30 '15 at 04:19