4

I'm working on fitting a generalized linear model in R (using glm()) for some data that has two predictors in full factorial. I'm confident that the gamma family is the right error distribution to use but not sure about which link function to use so I'd like to test all possible link functions against one another. Of course, I can do this manually by making a separate model for each link function and then compare deviances, but I imagine there is a R function that will do this and compile results. I have searched on CRAN, SO, Cross-validated, and the web - the closest function I found was clm2 but I do not believe I want a cumulative link model - based on my understanding of what clm's are.

My current model looks like this:

CO2_med_glm_alf_gamma <- glm(flux_median_mod_CO2~PercentH2OGrav+
                           I(PercentH2OGrav^2)+Min_Dist+
                           I(Min_Dist^2)+PercentH2OGrav*Min_Dist, 
                  data = NC_alf_DF,
                  family=Gamma(link="inverse"))

How do I code this model into an R function that will do such a 'goodness-of-link' test?

(As far as the statistical validity of such a test goes, this discussion as well as a discussion with a stats post-doc lead me to believe that is valid to compare AIC or deviances between generalized linear models that are identical except for having different link functions)

Community
  • 1
  • 1
DirtStats
  • 559
  • 9
  • 29
  • 1
    It's not really valid to compare deviance across non-nested models. I think your problem is statistical and belongs on [stats.se]. Once you know exactly what that right way to compare models is, but are unsure how to do it in R, then that's more of a Stack Overflow question. If it is just a question of programming, then add a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and desired output. – MrFlick Apr 20 '15 at 23:33
  • 1
    Initially I wasn't sure if the comparison (two identical models with identical data, only the link function varies) was statistically valid but [this discussion](http://stats.stackexchange.com/questions/46833/problem-with-comparing-glm-models-having-a-different-link-function) supports the idea that it is valid. – DirtStats Apr 21 '15 at 20:49
  • I have edited the question to try to bring my question out of the possibility of being off-topic, I hope my edits have helped. I see this question as similar to [this heavily discussed 'recommendation' question](http://meta.stackoverflow.com/questions/254393/what-exactly-is-a-recommendation-question) which, I believe, was edited and then decided to be a duplicate. In my edits I have tried to make it clear that this a ultimately a coding question, one where further background in available function may be useful. – DirtStats Apr 21 '15 at 21:38
  • If the edits are inadequate to bring it in line, I'd be happy to edit further. – DirtStats Apr 21 '15 at 21:39
  • I think all you need is a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on which possible solutions can be verified. Ideally with the desired output. That even more clearly makes it a programming problem. As Ben points out, being more clear about which links you want to try ("all possible" simply isn't defined since you could have infinitely many) would also help. – MrFlick Apr 21 '15 at 21:57

2 Answers2

3

This is not "all possible links", it's testing against a specified class of links, but there is a goodness-of-link test by Pregibon that is implemented in the LDdiag package. It's not on CRAN, but you can install it from the archives via

devtools::install_version("LDdiag","0.1")

The example given (not that exciting) is

quine$Days <- ifelse(quine$Days==0, 1, quine$Days)
ex <- glm(Days ~ ., family = Gamma(link="log"), data = quine)
pregibon(ex)

The pregibon family of link functions is implemented in the glmx package. As pointed out by Achim Zeleis in comments, the package provides various parametric link functions and supports general estimation and inference based on such parametric links (or more generally parametric families). To see a worked example how this can be employed for a variety of goodness-of-link assessements, see example("WECO", package = "glmx"). This replicates the analyses from two papers by Koenker and Yoon (see below).

This example might be useful too.

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    To add some more information to Ben's comment about `glmx`. The package provides various parametric link functions and supports general estimation and inference based on such parametric links (or more generally parametric families). To see a worked example how this can be employed for a variety of goodness-of-link assessements, see `example("WECO", package = "glmx")`. This replicates the analyses from two papers by Roger Koenker (and co-author) in _R News_ and _Journal of Econometrics_, respectively. – Achim Zeileis Apr 21 '15 at 10:44
  • Thank you both for these suggestions and resources. I'll check out `glmx` and the examples. – DirtStats Apr 22 '15 at 17:31
0

I have learned that the dredge function (MuMIn package) can be used to perform goodness-of-link tests on glms, lms, etc. More generally it is a model selection function but allows for a good deal of customization. In this case, you can use the varying option to compare models fit with different link functions. See the Beetle example that they work for details.

DirtStats
  • 559
  • 9
  • 29