- I have a dataset with x countries over y years.
- I would like to do a certain analysis (see indicated below, but this code is not the problem)
- The problem: I would like to do this analysis of the code I already have, a number of times: each time with a different dataset that has another combination of the x countries and y years. To be clear: I would like to do the analysis for EACH possible combination of the x countries and the y years.
The code that I would like to execute for each version of the dataset (explanation dataset see further)
library(stats)
##### the analysis for one dataset ####
d=data.frame(outcome_spring=rep(1,999),outcome_summer=rep(1,999),
outcome_autumn=rep(1,999),outcome_winter=rep(1,999))
o <- lapply(1:999, function(i) {
Alldata_Rainfed<-subset(Alldata, rainfed <= i)
outcome_spring=sum(Alldata$spring)
outcome_summer=sum(Alldata$summer)
outcome_autumn=sum(Alldata$autumn)
outcome_winter=sum(Alldata$winter)
d[i, ] = c(outcome_spring, outcome_summer, outcome_autumn, outcome_winter)
} )
combination<-as.data.frame(do.call(rbind, o)) #the output I want is another dataset for each unique dataset
#### the end of the analysis for one dataset ####
Desired output
That means that as an output I need to have the same amounts of datasets (named "combination" in the example) as the number of combinations possible between x countries and y years.
As an example, imagine having the following dataset (real dataset has over 500000 observations, 15 countries, 9 years)
> dput(Alldata)
structure(list(country = c("belgium", "belgium", "belgium", "belgium",
"germany", "germany", "germany", "germany"), year = c(2004, 2005,
2005, 2013, 2005, 2009, 2013, 2013), spring = c(23, 24, 45, 23,
1, 34, 5, 23), summer = c(25, 43, 654, 565, 23, 1, 23, 435),
autumn = c(23, 12, 4, 12, 24, 64, 23, 12), winter = c(34,
45, 64, 13, 346, 74, 54, 45), irrigation = c(10, 30, 40,
300, 288, 500, 996, 235), id = c(1, 2, 2, 3, 4, 5, 6, 6)), datalabel = "", time.stamp = "14 Nov 2016 20:09", .Names = c("country",
"year", "spring", "summer", "autumn", "winter", "irrigation",
"id"), formats = c("%9s", "%9.0g", "%9.0g", "%9.0g", "%9.0g",
"%9.0g", "%9.0g", "%9.0g"), types = c(7L, 254L, 254L, 254L, 254L,
254L, 254L, 254L), val.labels = c("", "", "", "", "", "", "",
""), var.labels = c("", "", "", "", "", "", "", "group(country year)"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8"), version = 12L, class = "data.frame")
In the example above, I already made an id for combining country and year. That means I want to make datasets with all observations that have combinations of the following ids:
- dataset 1_2_3_4_5: ids 1, 2, 3, 4, 5 (so this dataset only misses the observations with id = 6)
- dataset 1_2_3_4_6: ids 1, 2, 3, 4, 6 (but not 5)
- dataset 1_2: ids 1, 2 (but not all the rest)
- dataset 3_4_5: ids 3, 4, 5 (but not all the rest)
- ....
etc etc... Note that I gave the name of the dataset the name of the ids that are included. Otherwise it will be hard for me to distinguish all the different datasets from each other. Other names are fine too, as long as I can distinguish between the datasets!
Thanks for your help!
EDIT: it might be possible that certain datasets give no results (because in the second loop irrigation is used too loop and certain combinations might not have irrigation) but then the output should just be a dataset with missing values