0

I have subset some data frames based on a three month period and named like jfm (for January to March) , fma(February to April) , mam(March to May) … until ond(October to December). I wish to run similar analysis on all of these data using several variables as regressors. Below I show how I run the analysis for one the two subset data frames using one of the pollutants as a regressor. I am interested to run the analysis for all pollutants (pm10median, pm25median, o3median and so2median) each entered into the model separately. How can I do this analysis for all data frames?

library(gamair) 
library(mgcv)
data(chicago) 
chicago$date<-seq(from=as.Date("1987-01-01"), to=as.Date("2000-12-31"),length=5114)


chicago$month<-as.numeric(format(chicago$date,"%m")) ## create month
jfm <- subset(chicago, month %in% c(1:3) )      ## subset data for January to March
fma <- subset(chicago, month %in% c(2:4) )  ## February to April
mam <- subset(chicago, month %in% c(3:5) )  ## March to may


jfm$trend<-seq(dim(jfm)[1])   ## cretae a trend for specific df based on dimension of the df
fma$trend<-seq(dim(fma)[1])   ## trend for df fma


## Regress each pollutant separately on death for the first subset

model1<-gam(death ~  pm10median + s(trend,k=21)+ s(tmpd,k=6) ,family=quasipoisson,na.action=na.omit,data=jfm) 

model2<-gam(death ~  pm10median + s(trend,k=21)+ s(tmpd,k=6) ,family=quasipoisson,na.action=na.omit,data=fma) 
Meso
  • 1,375
  • 5
  • 17
  • 36

1 Answers1

0
# create a function that defines the exact regression
# you want to run on all three-month data sets
fun <- 
    function( y , x ){

        # store each of the regression outputs into an object
        a <- gam(
            death ~  pm10median + s(trend,k=21)+ s(tmpd,k=6) ,
            family = quasipoisson , 
            na.action = na.omit ,
            data = x[ x$month %in% y , ]
        ) 
        b <- gam(
            death ~  pm25median + s(trend,k=21)+ s(tmpd,k=6) ,
            family = quasipoisson , 
            na.action = na.omit ,
            data = x[ x$month %in% y , ]
        ) 

        # return each of the regressions as a list
        list( a , b )
    }

# define which three-month groups you want to run it on
months <- cbind( 1:10 , 2:11 , 3:12 )

# now just run the function for each row in `months`
results <- apply( months , 1 , fun , x = chicago )

# look at the whole thing
results

# extract jfm, for example
jfm <- results[[1]]

# extract fma (and print it to the screen as well)
( fma <- results[[2]] )
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • Dear Anthony, Thanks for your wonderful code, it worked on the sample, as well as on my own data for a single regressor. Could you add to your code how to loop through different pollutants? BTW: I like your twotorials ! – Meso Feb 05 '13 at 11:34
  • @user1754610 not until you provide a reproducible example. [read this and edit your question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Anthony Damico Feb 05 '13 at 13:04
  • I am referring to the variables in the sample chicago data set(pm10median, pm25median, o3median and so2median). See my edit. – Meso Feb 05 '13 at 13:25
  • @user1754610 see edit. next time, please ask all your questions at once :P – Anthony Damico Feb 05 '13 at 14:08