1

I have a large data frame with 2107377 observations of 46 variables. I have a function that subsets this data frame based on the day of the year:

subset.function = function(dataset,year,focal.year,day.of.year) {
    subset(dataset, year==focal.year & day.of.year<=ifelse(leap_year(focal.year), 112,111))
}

The data were collected from 2004-2014. I want to create 11 data frames from this data frame, each consisting of all of the data associated with the first 111 (or 112, in a leap year) days of the focal year (focal year = 2004, 2005, 2006, etc.).

I could do this by applying my subset function 11 times, each time storing it in a new variable:

variable1 = subset.function(dataset, year, 2004, day.of.year)
variable2 = subset.function(dataset, year, 2005, day.of.year)
...
variable11 = subset.function(dataset, year, 2011, day.of.year),

but that's not very fun. I've tried to use a for loop to do this with fewer lines, but it doesn't work:

test = vector("list")
for (i in 1:years) {
  test[[i]] = subset.function(dataset, year, focal.year[i], day.of.year)
}

This creates a large list with the same number of elements in each item of the list as the original data frame. I've also tried using the apply family of functions:

apply(dataset,  year, focal.year[1:11], day.of.year)

with equally disappointing results. What am I missing?

Nigel Stackhouse
  • 481
  • 4
  • 20

1 Answers1

1

Thanks for the link, @user20650, that blew my mind. I thought subset was the bees knees, but it has some very strange behavior in certain conditions.

By switching my for loop to remove the call to subset, it works just fine!

test = vector("list")
for (i in 1:years) {
  test[[i]] = data[data$year == timeframe[i] & data$doy >= ifelse(leap_year(timeframe[i]), 112, 111),]
}
Nigel Stackhouse
  • 481
  • 4
  • 20