3

How do I print glm coefficients for all factor levels, including reference level? summary(glm_obj) prints only the values that deviate from reference values.

I know that those are 0's, but I need this for integration, i.e. telling other systems what factor levels can happen at all.

Sorry if it's too simple, could not find anywhere.

Thanks

To illustrate the problem I am facing:

> glm(Petal.Width~Species,data=iris)  

Call:  glm(formula = Petal.Width ~ Species, data = iris)  

Coefficients:
          (Intercept)  Speciesversicolor   Speciesvirginica  
                0.246              1.080              1.780  

Degrees of Freedom: 149 Total (i.e. Null);  147 Residual
Null Deviance:      86.57 
Residual Deviance: 6.157    AIC: -45.29`

The model description above yields only coefficients for versicolor and virginica, which is, as Dason has noted, absolutely fine from the point of view of the model itself.

However, I needed to share the model with another application, which must know what levels of Species to expect (and e.g. issue a warning in once a new, unstudied level appears).

Summary() gives the same results:

> summary(glm(Petal.Width~Species,data=iris))

Call:
glm(formula = Petal.Width ~ Species, data = iris)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-0.626  -0.126  -0.026   0.154   0.474  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.24600    0.02894    8.50 1.96e-14 ***
Speciesversicolor  1.08000    0.04093   26.39  < 2e-16 ***
Speciesvirginica   1.78000    0.04093   43.49  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.04188163)

Null deviance: 86.5699  on 149  degrees of freedom
Residual deviance:  6.1566  on 147  degrees of freedom
AIC: -45.285

Number of Fisher Scoring iterations: 2
Prradep
  • 5,506
  • 5
  • 43
  • 84
coulminer
  • 358
  • 2
  • 8
  • 1
    Reference levels don't get a coefficient. They aren't printed because they don't exist. – Dason Jul 08 '14 at 13:10
  • Sure Dason, but please read the question: I would still like be able to print those coefficients since I am sending the model to another application, and that needs to know what coefficients exist at all. – coulminer Jul 08 '14 at 13:52
  • Please consider including a *small* [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can better understand and more easily answer your question. – Ben Bolker Jul 08 '14 at 14:34

4 Answers4

4

So I realise this question is pretty old, but a simple solution is to use the dummy.coef function

fit<-glm(Petal.Width~Species,data=iris)  
summary(fit)
dummy.coef(fit)

> dummy.coef(fit)
Full coefficients are 

(Intercept):     0.246                     
Species:        setosa versicolor virginica
                  0.00       1.08      1.78

I hope this helps!

Choppy123
  • 91
  • 7
3

You could re-write the summary.glm method. You can view its source by typing summary.glm into the console, or you can dump the source to a file by using sink first. Most of the display methods are written in R itself so you should be able to skim through the code and just add a line where necessary.

Alternatively, you could define an extra dummy variable for the reference level and add it to the model. R will just give you a warning and set the coefficient to NA. For example:

 # no coefficient for the reference level
l = lm(Sepal.Width~Species,iris)

# make a dummy for the reference level
iris$Speciessetosa = as.numeric(iris$Species == "setosa")

# you get NA for the coefficient on new dummy
l = lm(Sepal.Width~Species+Speciessetosa,iris)

Unfortunately you can't just set l$coefficients[4] = 0 because it won't show up in the print methods. The reason this doesn't work is clear from the source code, I recommend skimming through it.

If you really need 0 instead of NA, you could run the output through sed to change NA to 0 on the line in question, or even save the output as an R character vector and use the built-in gsub function, or change it by hand if you only have a few of these, sink the output to a file and use the find-and-replace feature in the R editor or an editor like Word or Sublime, etc.

shadowtalker
  • 12,529
  • 3
  • 53
  • 96
3

Answered to myself as I think this method fits the purpose better than what has been proposed.

Sharing predictive models is not what the summary.glm method supposed to do, therefore summary(model) does not say much about the data the model is applied to.

There is a solution though -- use PMML, it allows describing both, the model, and the data it should be applied to.

Example:

> library(pmml)
> pmml(glm(Petal.Width~Species,data=iris))
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
 <Header copyright="Copyright (c) 2014 dmitrijsl" description="Generalized Linear     Regression Model">
  <Extension name="user" value="dmitrijsl" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2014-07-15 15:07:51</Timestamp>
 </Header>
 <DataDictionary numberOfFields="2">
  <DataField name="Petal.Width" optype="continuous" dataType="double"/>
  <DataField name="Species" optype="categorical" dataType="string">
   <Value value="setosa"/>
   <Value value="versicolor"/>
   <Value value="virginica"/>
  </DataField>
...

Now also Setosa is in the list for the receiver system to know what to expect, and the model description is there:

...
<ParameterList>
 <Parameter name="p0" label="(Intercept)"/>
 <Parameter name="p1" label="Speciesversicolor"/>
 <Parameter name="p2" label="Speciesvirginica"/>
</ParameterList>
<FactorList>
 <Predictor name="Species"/>
</FactorList>
<CovariateList/>
<PPMatrix>
 <PPCell value="versicolor" predictorName="Species" parameterName="p1"/>
 <PPCell value="virginica" predictorName="Species" parameterName="p2"/>
</PPMatrix>
<ParamMatrix>
 <PCell parameterName="p0" df="1" beta="0.245999999999997"/>
 <PCell parameterName="p1" df="1" beta="1.08"/>
 <PCell parameterName="p2" df="1" beta="1.78"/>
</ParamMatrix>

coulminer
  • 358
  • 2
  • 8
0

rating_factors() returns all coefficients for a glm object:

library(insurancerating)
x <- glm(Petal.Width~Species,data=iris) 
rating_factors(x, exponentiate = FALSE)$df
#>   risk_factor       level est_x signif_x
#> 1 (Intercept) (Intercept) 0.246      ***
#> 2     Species      setosa 0.000         
#> 3     Species  versicolor 1.080      ***
#> 4     Species   virginica 1.780      ***

Created on 2021-11-28 by the reprex package (v0.3.0)

mharinga
  • 1,708
  • 10
  • 23