How to run the same code on multiple different datasets in R

Question

I am trying to run a series of fixed effects linear regressions on six different sets of data. For each dataset, I would like to run the regression multiple times on subsets of the data.

I have developed code to do this once, for one dataset. But I would like to write generic code, so that I can run this for each of the six separate sets of data.

This is what I have so far using an example dataset:

month <- (rep(0:35, 36))
monthfact <- as.factor(month)
prodid2<- as.character(rep(112:147, 36))
log_value <- rnorm(1296)
exp_share <- abs(rnorm(1296))
regdat <- data.frame(month, monthfact, prodid2, log_value, exp_share)
#Subset the data into 24 datasets, each of which includes a 13 month window
subfun<-function(x,y,z) {  subset(x,y>=z & y<=z+12)}
dsets <- lapply(1:24, function(x) subfun(regdat, regdat$month, x-1)) 
#Writing a function for running linear regressions

lmfun<-function(data){  lm(log_value~monthfact+prodid2, data = data, 
weights = data$exp_share)}
#Apply the function to all the datasets in the list
linreg<-lapply(dsets,lmfun)
coefs<-lapply(linreg,coef)
#Choose only the coefficients for month 
coefs <- as.data.frame(lapply(coefs, function(x) {x[2:13]}))
#Add in a row of 0 values for the baseline month
baseline<-rep(0,each=24)
coefs<-rbind(baseline,coefs)

#Compute the index using the dataframe created
FEindexes<-data.frame(lapply(coefs, function(x) (exp(x))/(exp(x[1]))))
splices<-FEindexes[2,]
splices <- apply(splices, 1, cumprod)
splices <- c(1,splices[1:23])
FEindex13<-t(FEindexes[13,])
FEWS<-splices*FEindex13
FEWS<-as.data.frame(FEWS[2:24])
firstFEWS<-as.data.frame(FEindexes[,1])
colnames(firstFEWS) <- "FEWS_index"
colnames(FEWS) <- "FEWS_index"
FEWS<-rbind(firstFEWS,FEWS)
View(FEWS)

I would like to run all of this code on 6 different datasets, and wondered if there's a way to do this in R without having to re-run all the code 6 times?

Thanks very much for your help.

I'd recommend putting the data.frames into a list and then using `lapply` to run through them. See gregor's answer to [this post](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) for some tips. — lmo, Mar 01 '17 at 00:11
it seems that you lmfun is doing just what you are looking for. isn't it? But you stopped at the regression level. Now you need to go further by wrapping the desired process inside a function — DJJ, Mar 01 '17 at 00:12

score 1 · Answer 1 · answered Oct 25 '17 at 13:58

You example code is a bit complex so I will explain it with a simpler example:

If you are ok to split your R-script you could use one script with all the functions you want to execute and a second script in which you call the first script via source(...) with different data sets. Very simple example: save this script as "my_functions.R" in your working directory (or specify the files location when you call source()):

plot(my.data)

Assuming you have list with all your data sets (but also works with data frame columns or what ever structure), call the first script via "source()":

list.of.my.data <- list(a=1:10, b=11:20, c=21:30)
for (i in 1:length(list.of.my.data)){
  my.data <- list.of.my.data[[i]]
  source("my_functions.R")
  }

Instead, if you prefer to keep everything in one R-script you could write one huuuge function and call this function with every data set as input:

# Example: set of data frames in a list
list.of.data.sets <- list(a=data.frame(x=1:10, y=1:10),
  b=data.frame(x=1:10, y=11:20),
  c=data.frame(x=1:10, y=21:30)
  )
# The meta function where you define all the things you want to do to your data sets:
my.meta.function <- function(my.data, color.parameter, size.parameter){
  plot(y~x, data=my.data, cex=size.parameter, col=color.parameter) 
  my.mean <- mean(my.data$y)
  return(my.mean)
  }
# Call the function for each data set with a for-loop:
for(i in 1:length(list.of.data.sets)){
    my.meta.function(my.data=list.of.data.sets[[i]], size.parameter=4, color.parameter=20)
    }
# Call the function for each data set with lapply (faster!):
results.of.all.data.sets <- lapply(list.of.data.sets, FUN=my.meta.function, size.parameter=4, color.parameter=20)

How to run the same code on multiple different datasets in R

1 Answers1