Plot predicted probabilities (logit)

Question

I am currently trying to plot the predicted probabilities of my logit model in r. I have followed the approach from this link: https://stats.idre.ucla.edu/r/dae/logit-regression/.

I have successfully made plots for Brussels office given the interest group type. However, I seek to only plot the individual effects: for example, I want to plot the predicted probability for Brussels office on Meetings with MEPs (that is, what is the probability of having meetings with MEPs when you have a Brussels office?). Also, I want to see the effect of staff size and/or organisational form on the dependent variable.

I have not found such an approach yet. Any advice?

Thank you in advance.

My variables:

Meetings with MEPS (dependent variable, dummy) 1 Yes 0 No

Interest group type (categorical) 1 Business 2 Consultancies 3 NGOs 4 Public authorities 5 Institutions 6 Trade union/prof.org. 7 Other

Brussels office 1 Yes 0 No

Organisational form 1 Individual org. 2 National association 3 European association 4 Other

Staff size (count variable, presented in full time equivalent) Ranges from 0.25 to 40

Please try to be more explicit. Include a working example, what you have tried, and what is currently not working. Please see here for guidance: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. It would probably be helpful if can expand on what you mean when you say "plot the individual effects". — Axeman, Jun 11 '20 at 20:54

score 0 · Answer 1 · answered Jun 11 '20 at 22:09

Since you didn't provide your data I'll use the dataset from the example you're familiar with from UCLA. Are you trying to do this (assuming Rank to be like one of your variables...

library(ggplot2)

mydata <- read.csv("binary.csv")
mydata$rank <- factor(mydata$rank)

mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(mylogit)
#> 
#> Call:
#> glm(formula = admit ~ gre + gpa + rank, family = "binomial", 
#>     data = mydata)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.6268  -0.8662  -0.6388   1.1490   2.0790  
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -3.989979   1.139951  -3.500 0.000465 ***
#> gre          0.002264   0.001094   2.070 0.038465 *  
#> gpa          0.804038   0.331819   2.423 0.015388 *  
#> rank2       -0.675443   0.316490  -2.134 0.032829 *  
#> rank3       -1.340204   0.345306  -3.881 0.000104 ***
#> rank4       -1.551464   0.417832  -3.713 0.000205 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 499.98  on 399  degrees of freedom
#> Residual deviance: 458.52  on 394  degrees of freedom
#> AIC: 470.52
#> 
#> Number of Fisher Scoring iterations: 4

newdata1 <- with(mydata, data.frame(gre = mean(gre), gpa = mean(gpa), rank = factor(1:4)))

newdata1
#>     gre    gpa rank
#> 1 587.7 3.3899    1
#> 2 587.7 3.3899    2
#> 3 587.7 3.3899    3
#> 4 587.7 3.3899    4

newdata1$rankP <- predict(mylogit, newdata = newdata1, type = "response")
newdata1
#>     gre    gpa rank     rankP
#> 1 587.7 3.3899    1 0.5166016
#> 2 587.7 3.3899    2 0.3522846
#> 3 587.7 3.3899    3 0.2186120
#> 4 587.7 3.3899    4 0.1846684

ggplot(newdata1, aes(x = rank, y = rankP)) +
   geom_col()

Thank you for your answer and suggestion. This is very helpful too. I am trying to visualize the predicted probability of, for example, Staff size on my dependent variable in a line graph. Using your (UCLA) example, I would like to visualize the predictive probability of gpa. Does this make sense? — Sara, Jun 11 '20 at 22:39
Yes, it makes sense I'll answer more fully tomorrow if no one else does by then. — Chuck P, Jun 12 '20 at 00:32

score 0 · Accepted Answer · answered Jun 12 '20 at 17:44

Picking up from yesterday.

library(ggplot2)

# mydata <- read.csv("binary.csv")
str(mydata)
#> 'data.frame':    400 obs. of  4 variables:
#>  $ admit: int  0 1 1 1 0 1 1 0 1 0 ...
#>  $ gre  : int  380 660 800 640 520 760 560 400 540 700 ...
#>  $ gpa  : num  3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 ...
#>  $ rank : int  3 3 1 4 4 2 1 2 3 2 ...

mydata$rank <- factor(mydata$rank)

mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")

summary(mylogit)
#> 
#> Call:
#> glm(formula = admit ~ gre + gpa + rank, family = "binomial", 
#>     data = mydata)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.6268  -0.8662  -0.6388   1.1490   2.0790  
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -3.989979   1.139951  -3.500 0.000465 ***
#> gre          0.002264   0.001094   2.070 0.038465 *  
#> gpa          0.804038   0.331819   2.423 0.015388 *  
#> rank2       -0.675443   0.316490  -2.134 0.032829 *  
#> rank3       -1.340204   0.345306  -3.881 0.000104 ***
#> rank4       -1.551464   0.417832  -3.713 0.000205 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 499.98  on 399  degrees of freedom
#> Residual deviance: 458.52  on 394  degrees of freedom
#> AIC: 470.52
#> 
#> Number of Fisher Scoring iterations: 4

We're going to graph GPA on the x axis let's generate some points

range(mydata$gpa) # using GPA for your staff size
#> [1] 2.26 4.00

gpa_sequence <- seq(from = 2.25, to = 4.01, by = .01) # 177 points along x axis

This is in the IDRE example but they made it complicated. Step one build a data frame that has our sequence of GPA points, the mean of GRE for every entry in that column, and our 4 factors repeated 177 times.

constantGRE <- with(mydata, data.frame(gre = mean(gre), # keep GRE constant
                                       gpa = rep(gpa_sequence, each = 4), # once per factor level
                                       rank = factor(rep(1:4, times = 177)))) # there's 177

str(constantGRE)
#> 'data.frame':    708 obs. of  3 variables:
#>  $ gre : num  588 588 588 588 588 ...
#>  $ gpa : num  2.25 2.25 2.25 2.25 2.26 2.26 2.26 2.26 2.27 2.27 ...
#>  $ rank: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...

Make predictions for every one of the 177 GPA values * 4 factor levels. Put that prediction in a new column called theprediction

constantGRE$theprediction <- predict(object = mylogit, 
                                      newdata = constantGRE, 
                                      type = "response")

Plot one line per level of rank, color the lines uniquely. NB the lines are not straight, nor perfectly parallel nor equally spaced.

ggplot(constantGRE, aes(x = gpa, y = theprediction, color = rank)) +
  geom_smooth()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

You might be tempted to just average the lines. Don't. If you want to know GPA by GRE not including Rank build a new model because (0.6357521 + 0.4704174 + 0.3136242 + 0.2700262) / 4 is not the proper answer.

Let's do it.

# leave rank out call it new name

mylogit2 <- glm(admit ~ gre + gpa, data = mydata, family = "binomial")
summary(mylogit2)
#> 
#> Call:
#> glm(formula = admit ~ gre + gpa, family = "binomial", data = mydata)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.2730  -0.8988  -0.7206   1.3013   2.0620  
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -4.949378   1.075093  -4.604 4.15e-06 ***
#> gre          0.002691   0.001057   2.544   0.0109 *  
#> gpa          0.754687   0.319586   2.361   0.0182 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 499.98  on 399  degrees of freedom
#> Residual deviance: 480.34  on 397  degrees of freedom
#> AIC: 486.34
#> 
#> Number of Fisher Scoring iterations: 4

Repeat the rest of the process to get one line

constantGRE2 <- with(mydata, data.frame(gre = mean(gre),
                                       gpa = gpa_sequence))
constantGRE2$theprediction <- predict(object = mylogit2, 
                                      newdata = constantGRE2, 
                                      type = "response")
ggplot(constantGRE2, aes(x = gpa, y = theprediction)) +
  geom_smooth()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

This is so helpful. Thank you so much for taking the time! Is it also possible to include the CIs in the graph? — Sara, Jun 12 '20 at 19:36
Trivial to add standard error `geom_smooth(se = TRUE)` IDRE site shows you fancier ideas — Chuck P, Jun 13 '20 at 12:45
@ChuckP do you now what to do if the variable on the x-axis is a dummy instead? I tried and seq() seems not to work with a dummy — rr19, Dec 06 '22 at 17:40
@rr19 sorry you lost me. Could you be a little more explicit abut what you mean when you say the x axis is a dummy variable? — Chuck P, Dec 07 '22 at 18:08
@ChuckP For instance you have plottet gpa on the x-axis which is continouse.. what about if it was a variable such as work=1 and notworking=0 which is a dummy or binary variable. Would the graph in my case then still be similar to yours? — rr19, Dec 07 '22 at 21:13
@rr19 yes glm models can be applied to almost any sorts of predictors not just continuous predictors. So if gpa were instead Pass/Fail the methodology would still apply. — Chuck P, Dec 08 '22 at 18:12
@ChuckP Sorry I forgot to mention that I'm using the `polr` function with logit from the MASS package. Is it still possible then? — rr19, Dec 08 '22 at 18:51

Plot predicted probabilities (logit)

2 Answers2

Linked