I have a large dataset with the presence and absence (0,1) of Blue Rockfish and multiple variables (in my case, bathymetry, curvature, eastness, fine scale BPI, and broad scale BPI).
structure(list(Pres_Abs = c(1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L), CommonName = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Blue Rockfish", class = "factor"), Survey_Yea = c(2009L,
2014L, 2005L, 2015L, 2006L, 2009L, 2014L, 2015L, 2015L, 2015L,
2005L, 2014L, 2015L, 2015L, 2014L, 2015L, 2015L, 2015L, 2015L,
2006L), ca_10mbath = c(-42.6913986, -36.0038986, -36.5155983,
-44.4014816, -39.3320007, -45.7226982, -47.9375, -51.5976982,
-37.7812996, -14.1093302, -70.5976562, -41.5121307, -48.4246902,
-46.0937996, -38.9961014, -46.375, -42.6913986, -60.96875, -46.375,
-37.6601601), ca10_bpi24 = c(-12L, -2L, -2L, -2L, -2L, -2L, 7L,
37L, -2L, 77L, -2L, -2L, 17L, 7L, -2L, -2L, -12L, -2L, -2L, 67L
), ca_10m_cur = c(-0.0859528, -0.0006409, -0.0068855, -0.5154228,
-0.0390663, -0.0078316, -0.0221901, 0.792961, 0, 4.3303394, 0.0429688,
-0.4405556, -0.1947556, 0.0195274, -0.230453, -0.0093803, -0.0859528,
-0.2148438, -0.0093803, 0.0976486), ca_10m_eas = c(0.727106,
0.887252, 0.565906, 0.9994883, 0.96552, 0.960033, 0.998732, 0.772206,
0.589553, -0.4134142, -0.8266082, -0.3659272, -0.7330094, 0.0329623,
0.998884, 0.271237, 0.727106, -0.5498384, 0.271237, 0.6424425
), ca10_bpi30 = c(-15L, -15L, -15L, -15L, -15L, -15L, -15L, -15L,
-15L, 262L, -15L, -15L, -15L, -15L, -15L, -15L, -15L, -15L, -15L,
-15L)), row.names = c(2032L, 3801L, 479L, 4421L, 997L, 1551L,
3079L, 4657L, 5059L, 4104L, 261L, 2849L, 4460L, 4765L, 3535L,
4842L, 4950L, 4323L, 4833L, 752L), class = "data.frame")
In addition, I have multiple years of data (2005, 2006, 2007, 2009, 2014, 2015). I am basically wanting to create a glm
Pres_Abs~bathy+curvature+eastness+broadscale+finescale, data=Blue_allyears, family=binomial(link=logit))
that goes through every combination of years. So, on the 1-year level, I created glms using data from 2005, then data from 2006, then data from 2007, etc. Within that code, I am saving data such as AIC, residual and null deviance, chi square, p value Etc.
This was my code (adapted from someone else on stackoverflow) that I used to loop through the first years of data:
results <- data.frame()
for(Survey_Yea in unique(Blue_allyears$Survey_Yea)){
# dynamically generate formula
fmla <- as.formula(Pres_Abs~ca_10mbath+ca_10m_cur+ca_10m_eas+ca10_bpi30+ca10_bpi24)
# fit glm model
fit<-glm(fmla,data=Blue_allyears[Blue_allyears$Survey_Yea == Survey_Yea,],family=binomial(link=logit))
## capture summary stats
AIC <- AIC(fit)
Deviance <- deviance(fit)
NullDeviance <- fit$null.deviance
null_minus_dev<-NullDeviance-Deviance
df.residual<- fit$df.residual
df.null<-fit$df.null
df.null.minus.df.residual<-df.null-df.residual
pvalue<- with(fit, 1-pchisq(null_minus_dev , df.null.minus.df.residual))
Years<-"1"
# get coefficents of fit
cfit <- coef(summary(fit))
# create temporary data frame
df <- data.frame( Survey_Yea = Survey_Yea,
AIC = AIC(fit), Deviance = deviance(fit),NullDeviance = fit$null.deviance, null.minus.dev=NullDeviance-Deviance, df.residual= fit$df.residual , df.null=fit$df.null , df.null.minus.df.residual=df.null-df.residual, pvalue= pvalue, Years="1", stringsAsFactors = F)
# bind rows of temporary data frame to the results data frame
results <- rbind(results, df)
}
results
This code was great and created glms from each year of data.
structure(list(Survey_Yea = c(2005L, 2006L, 2007L, 2009L, 2014L,
2015L), AIC = c(731.84838805646, 480.699964265887, 113.681123536743,
764.359566454308, 1482.05275641814, 1581.2853892652), Deviance = c(719.84838805646,
468.699964265887, 101.681123536743, 752.359566454308, 1470.05275641814,
1569.2853892652), NullDeviance = c(987.041585117362, 690.374591837705,
174.673089501106, 1059.1288918956, 2412.15218834861, 2012.89941234608
), null.minus.dev = c(267.193197060902, 221.674627571818, 72.991965964363,
306.769325441288, 942.099431930472, 443.614023080884), df.residual = c(706L,
492L, 120L, 758L, 1734L, 1446L), df.null = c(711L, 497L, 125L,
763L, 1739L, 1451L), df.null.minus.df.residual = c(5L, 5L, 5L,
5L, 5L, 5L), pvalue = c(0, 0, 2.44249065417534e-14, 0, 0, 0),
Years = c("1", "1", "1", "1", "1", "1")), row.names = c(NA,
-6L), class = "data.frame")
Now, I would like to go through two years of data and create the glms and extract the associated data. So, for example the year iterations would be: 2005 and 2006 2005 and 2007 2005 and 2009 2005 and 2014 2005 and 2015 2006 and 2007 2006 and 2009 etc.... 2014 and 2015
After doing this with two years of data, I'd like to go through every combination with three years of data, etc. until I get to using all years of data.
I have played around adding another for loop or adding in combn() to my code but with no luck.
Any help would be much appreciated!
Also, this is my first time posting so let me know if you need more data. Thanks!