6

I have a large data frame that consists of data that looks something like this:

        date    w    x    y    z    region
1    2012 01    21   43   12    3   NORTH
2    2012 02    32   54   21   16   NORTH
3    2012 03    14   32   65   32   NORTH
4    2012 04    65   33   75   21   NORTH
:        :      :    :    :    :       :
:        :      :    :    :    :       :
12   2012 12    32   58   53   17   NORTH
13   2012 01    12   47   43   23   SOUTH
14   2012 02    87   43   21   76   SOUTH
:        :      :    :    :    :       :
25   2012 01    12   46   84   29    EAST
26   2012 02    85   29   90   12    EAST
:        :      :    :    :    :       :
:        :      :    :    :    :       :

I want to extract section of the data that have the same date value, for example to do this just for 2012 01 I would just create a subset of data

data_1 <- subset(data, date == "2012 01")

and this gives me all the data for 2012 01 but I then go on to apply a function to this data. I would like to be able to apply my function to all possible subsets of my data, so ideally I would be looping through my large data frame and extracting the data for 2012 01, 2012 02, 2012 03, 2012 04... and applying a function to each of these subsets of data separately.

But I would like to be able to apply this to my data frame even if my data frames length were to change, so it may not always go from 2012 01 - 2012 12, the range of dates may vary so that sometimes it may be used on data from for example 2011 03 - 2013 01.

userk
  • 901
  • 5
  • 11
  • 19

5 Answers5

15

Loop through each unique date and build the subset.

uniq <- unique(unlist(data$Date))
for (i in 1:length(uniq)){
    data_1 <- subset(data, date == uniq[i])
    #your desired function
}
TylerDurden
  • 1,632
  • 1
  • 20
  • 30
  • Will each subset have a unique name? From what I see you're going to end up putting each subset in one dataframe. Thx – BlackHat Jul 01 '15 at 20:19
  • no each loop will just overwrite `data_1` and then the user can apply whatever function to the dataframe and choose themselves where to store the results. – TylerDurden Jul 03 '15 at 09:25
  • @TylerDurden . It looks what I need. What if I want to make subsets based both on region and date? e.g. data_1 <- subset(data, date == "2012 01" & "North") – Polar Bear Aug 27 '16 at 08:29
  • @PolarBear thats a new question. Just google it and the answer is straightforward https://www.google.com/#safe=active&q=subset+data+r+2+conditions – TylerDurden Aug 29 '16 at 13:13
10

is this what you want ? df_list <- split(data, as.factor(data$date))

statquant
  • 13,672
  • 21
  • 91
  • 162
  • this is perfect! Such a simple answer for something I thought would be much more complex, thank you – userk Aug 22 '13 at 14:14
2

After sub-setting your dataset by date, imagine that the function you would like to apply to each subset is to find the mean of the column x. You could do it this way: (df is your dataframe)

 library(plyr)
 ddply(df, .(date), summarize, mean = mean(x))
Mayou
  • 8,498
  • 16
  • 59
  • 98
0

You can split your data.frame into a list of data.frames like this:

list.of.dfs<-by(data,data$date)
nograpes
  • 18,623
  • 1
  • 44
  • 67
0

This is a perfect situation for the plyr package:

require(plyr)
ddply(my_df, .(date), my_function, extra_arg_1, extra_arg_2)

where my_function is the function you want to perform on the split data frames, and extra_args are any extra arguments that need to go to that function.

ddply (data frame -> data frame) is the form you want if you want your results in a data frame; dlply returns a list.

Drew Steen
  • 16,045
  • 12
  • 62
  • 90