0

Using data from the National Health Interview Survey, I am hoping to analyze the average marginal effect a variety of demographic factors have on the predicted probability of having hypertension using a logistic regression. To clarify, by average marginal effect I mean that I want to be computing the marginal effect at the mean of every X (like the STATA output).

My issue is that I have both binary and continuous independent variables, but from what I've read, it doesn't make sense to evaluate the binary variables at their mean, since it's either a 0 or 1. I don't know how to make the regression run where I can evaluate the continuous variables at their mean, but not the binary ones. Here is the code I have so far.


#Here I create a data frame of the means of the continuous variables 
mean_df=df %>% select(c(AGE,BMICALC,FAMSIZE,YEARSONJOB,HOURSWRK)) %>% summarise_all(mean)


#here is my regression, variables here not in the line of code above are binary 
logit_margin_diabetes <- glm(DIABETES~scale(AGE)+scale(IMMIGRANT)+scale(HOURSWRK)+scale(BELOW_TWICE_POVERTY)
+scale(BMICALC)+scale(FEMALE)+scale(FAMSIZE)+scale(EDUC_1)+scale(EDUC_2)+scale(EDUC_3)+
scale(EDUC_4)+scale(SMOKE)+scale(MARRIED)+scale(HISP)+scale(AFR_AM)+scale(WHITE), data = df,family="binomial")

#This is the stage where I want to apply the logit so it is evaluated at the means of the continuous variables. But I don't know what to do about the binary variables 
marg_mean<-margins(logit_margin_diabetes,data=mean_df)
summary(marg_mean)

Apologies, it was difficult for me to produce and MRE, since I don't know of a dataset in R that has this sort of information. But if anyone can provide any advice that would be greatly appreciated! Thanks.

Here is the modified output per the comment below. But I would like the output to show the SE,AME,and p values too

margins(logit_margin, at=list(AGE=35.93349,BMICALC=26.90704, FAMSIZE=2.495413, YEARSONJOB=4.538336,
                                        HOURSWRK=32.53768,IMMIGRANT=1,
                                        BELOW_TWICE_POVERTY=1, FEMALE=1,
                                       EDUC_1=1,EDUC_2=1,EDUC_3=1,EDUC_4=1,
                                        SMOKE=1,MARRIED=1,HISP=1,
                                        AFR_AM=1,WHITE=1))
summary(marg_mean)

enter image description here

This is a photo of the new output I see after running summary(marg_mean)

juliah0494
  • 175
  • 11

1 Answers1

1

The margins package takes care of this automatically if you declare a variable to be a factor. See the subsetting section of the vignette or you can inspect the source code to see that marginal effects are computed as differences for factor variables.

Note that the default setting for margins is to compute the "average marginal effect", and not the "marginal effect at the mean". IMO, the default setting is best in most cases, but if you insist on considering a "synthetic" average observation, it is easy to do with the at argument of the margins function.

Code example. In the first case, vs is treated as a continuous variable. In the second, vs is treated as a binary variable.

library(margins)

mod1 <- glm(am ~ hp + vs, data=mtcars, family=binomial)
mod2 <- glm(am ~ hp + factor(vs), data=mtcars, family=binomial)

margins(mod1)
#> Average marginal effects
#> glm(formula = am ~ hp + vs, family = binomial, data = mtcars)
#>        hp       vs
#>  -0.00203 -0.03193

margins(mod2)
#> Average marginal effects
#> glm(formula = am ~ hp + factor(vs), family = binomial, data = mtcars)
#>        hp      vs1
#>  -0.00203 -0.03154

Edit: Here's an example of the at argument:

margins(mod1, at=list(hp=200, vs=0))
Vincent
  • 15,809
  • 7
  • 37
  • 39
  • Great thank you Vincent! All this information is very helpful. So just to clarify, in the last line of code you set vs=0, I should set all binary variables equal to 0 here? – juliah0494 Sep 01 '20 at 01:40
  • Well, it depends *which* marginal effect you care about. Say there are two binary variables `x` and `w` in your model, you might well care about the marginal effect of `x` when `w=1`. In that case, you would set `at=list(w=1)`. But otherwise, it sounds like you got the right idea. – Vincent Sep 01 '20 at 01:48
  • Ah I see, makes sense! Great. One other question, I applied your suggestion with my data, and it did run, but didn't produce the output I am hoping for. I am hoping the final table will provide me with each of the coefficients along with their p-value, standard error, etc. Any ideas on how to do that? I modified my post to give you a sense of what I did. Thank you again! – juliah0494 Sep 01 '20 at 01:56
  • Can you try to run my code and then do `summary(margins(mod1))`. For me, this gives the AME, SE, z, p, lower, upper. – Vincent Sep 01 '20 at 02:04
  • Yes that does run properly. But I'm not able to get it to work when you add the "at" argument in your example: margins(mod1, at=list(hp=200, vs=0)) – juliah0494 Sep 01 '20 at 18:01
  • If it helps, the error I receive is: "Error in attributes(.Data) <- c(attributes(.Data), attrib) : 'names' attribute [1] must be the same length as the vector [0]" – juliah0494 Sep 01 '20 at 18:15
  • No, sorry. I can't help diagnose if I don't have a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) Also, once you find one, it's best to open a new question because this seems like different problem than the original one. It's not ideal to edit an original question (akin to moving the goal post). – Vincent Sep 01 '20 at 19:40