ggpredict with categorical logistic regression in R

Question

I have a data-frame called "student" with 4 variables. I would like to perform multivariate logistic regression with one binary dependent variable "gender" that consists of two categorical values(F-M) as Female and Male and 3 independent variables (reading_score, math_score are continues "double" and lunch as categorical "character").

To start the logistic regression I converted the gender variable into a factor (as it was not working other way). Then, I used "glm" function with "binomial" family as below:

student$gender <- as.factor(student$gender)
glm.fit <- glm(gender~. , data = student, family = "binomial")

Now I would like to plot the model with "ggpredict" function. However, I usually got the same error

Error: Discrete value supplied to continuous scale

I tried to plot it using ggpredict as below:

ggpredict(glm.fit) %>% plot()

I really tried many tutorials and I saw many questions related to this topic. I did not figure it out yet.

The propose of using ggpredict for plotting that I have 3 independent variables.

Note: A sample of the data is presented in the figure below.

It helps reproduce the problem when the post includes a data set. An effective way to include one is `dput()`. Run dput, then paste the output into your question. See [rdocumentation](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/dput). If your object is a vector, matrix, table, or data frame and is large, `object |> head() |> dput()` will help give manageably sized output. — Isaiah, Nov 26 '22 at 02:19
I have edited the question and included sample of the dataset @Isaiah — Deema, Nov 26 '22 at 05:02

score 0 · Accepted Answer · answered Nov 27 '22 at 10:50

There is the package sjPlot which does this very well. You may have a look here for further explanations and examples.

As I wanted to show a plot which comes close to your data, I have created a new DF with gender as a binomial distributed dependent variable.

library(tidyverse)
library(sjPlot)

set.seed(123) # set seed 

# create 1000 reading and math scores randomly
math_score = sample(60:100, 1000, replace = T)
reading_score = sample(60:100, 1000, replace = T)
# create lunch randomly from "standard" or "free"
lunch = sample(c('standard', 'free'), 1000, replace = T)

# make data frmae
student <- data.frame(
  math_score = math_score,
  reading_score = reading_score,
  lunch = as.factor(lunch)
) |> 
  # create a probability vector with some bias
  # bias is: if math_score is above its mean
  # prob is 0.8
  mutate(prob = ifelse(
    math_score > mean(math_score),
    0.8, 0.2
  )) |> 
  # create dependent variable as a binomial one
  # with prob as above
  mutate(gender = factor(
    rbinom(n = 1000, size = 1, prob = prob))) |> 
  select(-prob)

# make the fit
glm.fit <- glm(gender~., data = student, family = "binomial")

summary(glm.fit)
#> 
#> Call:
#> glm(formula = gender ~ ., family = "binomial", data = student)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.3264  -0.7876   0.3922   0.8381   2.3096  
#> 
#> Coefficients:
#>                 Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   -10.884194   0.877704 -12.401   <2e-16 ***
#> math_score      0.120908   0.007850  15.403   <2e-16 ***
#> reading_score   0.014737   0.006551   2.249   0.0245 *  
#> lunchstandard   0.092162   0.152221   0.605   0.5449    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 1386.2  on 999  degrees of freedom
#> Residual deviance: 1045.7  on 996  degrees of freedom
#> AIC: 1053.7
#> 
#> Number of Fisher Scoring iterations: 4

The summary shows the expected result with math_score being significant. The model can be plot as follows.

plot_model(
  glm.fit, 
  type = "pred", 
  terms = c("math_score", "reading_score", "lunch"), 
  colors = "bw",
  ci.lvl = NA
)

PS: Maybe it would be more instructive, if we could use your real data, or a part of your real data. Please have look to dput() as people normally do not want to type your data in. Another Good Read is How make good minimum working example

it seems that i have problem of installing "sjPlot" library in my system. However, as it works in your side i will accept the answer and i will keep trying install the package. @MarBlo — Deema, Dec 15 '22 at 15:25
You may have a look to https://github.com/strengejacke/sjPlot/issues/640 updating some packages, especially dplyr, may solve the problem. — MarBlo, Dec 15 '22 at 15:40

ggpredict with categorical logistic regression in R

1 Answers1