1

I am puzzling over an issue with glm where interaction terms don't show up in the estimation results unless they are entered as separate variables. I was not able to reproduce the problem using the mtcars data set, but there is nothing unusual about my data.

The first three models below give the same results, while the fourth gives the expected result: Code:

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

Output:

> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), 
    data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.784  -3.079  -2.257   2.729   8.409  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.33000    0.01479   22.31   <2e-16 ***
wgrp1             -0.77032    0.01717  -44.87   <2e-16 ***
pop1_3km_original -0.44447    0.01958  -22.70   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29474  on 2325  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29642

Number of Fisher Scoring iterations: 5

> 
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original, 
    family = binomial(link = "probit"), data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.784  -3.079  -2.257   2.729   8.409  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.33000    0.01479   22.31   <2e-16 ***
wgrp1             -0.77032    0.01717  -44.87   <2e-16 ***
pop1_3km_original -0.44447    0.01958  -22.70   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29474  on 2325  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29642

Number of Fisher Scoring iterations: 5

> 
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Error in terms.formula(formula, data = data) : invalid power in formula
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original, 
    family = binomial(link = "probit"), data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.784  -3.079  -2.257   2.729   8.409  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.33000    0.01479   22.31   <2e-16 ***
wgrp1             -0.77032    0.01717  -44.87   <2e-16 ***
pop1_3km_original -0.44447    0.01958  -22.70   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29474  on 2325  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29642

Number of Fisher Scoring iterations: 5

> 
> df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
> 
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original + squared, 
    family = binomial(link = "probit"), data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.839  -3.056  -2.288   2.755   8.401  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.31527    0.01689  18.669  < 2e-16 ***
wgrp1             -0.76865    0.01720 -44.702  < 2e-16 ***
pop1_3km_original -0.35047    0.05525  -6.344 2.24e-10 ***
squared           -0.07031    0.03878  -1.813   0.0698 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29471  on 2324  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29641

Number of Fisher Scoring iterations: 5
ccmullally
  • 41
  • 3
  • does [This](https://stackoverflow.com/questions/55739777/linear-model-in-r-multiplication-expression) solve your problem? – Donald Seinen Dec 15 '21 at 03:38
  • No, the issue is that the interaction is being dropped in the first 3 approaches, and I cannot see why. – ccmullally Dec 15 '21 at 03:48
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick Dec 15 '21 at 04:52
  • You should fix your terminology. You are not actually dealing with an interaction. – Roland Dec 15 '21 at 07:21

1 Answers1

1

This type of interaction is actually a polynomial, and * won't work here. You may want to add the squared term, but use the I() function to prevent glm from interpreting it as arithmetic operation.

glm(am ~ mpg*mpg, family=binomial(link='probit'), mtcars)$coe  ## won't work
# (Intercept)         mpg 
#  -3.9102065   0.1822684 

glm(am ~ mpg + I(mpg^2), family=binomial(link='probit'), mtcars)$coe  ## works
#  (Intercept)          mpg     I(mpg^2) 
# -1.504927683 -0.064500921  0.006066692 

Or use poly(..., raw=TRUE)

glm(am ~ poly(mpg, 2, raw=TRUE), family=binomial(link='probit'), mtcars)$coe
 # (Intercept) poly(mpg, 2, raw = TRUE)1 poly(mpg, 2, raw = TRUE)2 
# -1.504927683              -0.064500921               0.006066692 

Or calculate the term beforehand.

glm(am ~ mpg + mpgsq, family=binomial(link='probit'), transform(mtcars, mpgsq=mpg^2))$coe
#  (Intercept)          mpg        mpgsq 
# -1.504927683 -0.064500921  0.006066692 
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thank you all. My confusion was a product of being used to how Stata treats squared terms and interactions, i.e., in exactly the same way. I'm still new to R. – ccmullally Dec 15 '21 at 12:24
  • 1
    @ccmullally You're welcome! Just noticed I had not considered the ` link=probit`, which is important since `binomial()` defaults to `link='logit'`! BTW if you are looking for robust standard errors and relatives, take a look at [`sandwich`](https://cran.r-project.org/web/packages/sandwich/index.html) and [`lmtest`](https://cran.r-project.org/web/packages/lmtest/index.html) packages. E.g. `coeftest(fit, vcov.=vcovHC(fit, type="HC1"))` gives you the same standard errors as in Stata `robust`, where `fit` is the `glm` output. – jay.sf Dec 19 '21 at 05:29