16

I'm attempting to implement mixed effects logistic regression in python. As a point of comparison, I'm using the glmer function from the lme4 package in R.

I've found that the statsmodels module has a BinomialBayesMixedGLM that should be able to fit such a model. However, I've encountered a number of issues:

  1. I find the documentation for the statsmodels function to be not entirely helpful or clear, so I'm not completely sure how to use the function appropriately.
  2. So far, my attempts have not produced results that replicate what I get when fitting the model with glmer in R.
  3. I expect the BinomialBayesMixedGLM function does not calculate p-values since it is Bayesian, but I can't seem to figure out how to access the full posterior distributions for the parameters.

As a test case, I'm using the titanic dataset available here.

import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb

titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))

r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()

#           Type    Post. Mean  Post. SD       SD SD (LB) SD (UB)
# Intercept M           3.1623    0.3616            
# Age       M          -0.0380    0.0061            
# Pclass    V           0.0754    0.5669    1.078   0.347   3.351

However, aside from the slope for Age, this doesn't appear to match what I get in R with glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial"):

Random effects:
 Groups Name        Variance Std.Dev.
 Pclass (Intercept) 0.8563   0.9254  
Number of obs: 887, groups:  Pclass, 3

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.961780   0.573402   1.677   0.0935 .  
Age         -0.038708   0.006243  -6.200 5.65e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So what error am I making when creating my model in python? And, once that's addressed, how can I extract either the posteriors or p-values? Lastly, are their any python implementations of mixed effect logistic regressions that are more akin to the implementation in R?

YTD
  • 457
  • 3
  • 11
  • 1
    @petezurich -- For my own knowledge, why edit the title so that it no longer makes it clear that my question is specifically about how to fit this type of model in python? The new title seems must less descriptive for others searching to find an answer to a similar question. – YTD Feb 25 '20 at 21:52
  • 2
    This is what tags are for. See [this post](https://meta.stackexchange.com/questions/19190/should-questions-include-tags-in-their-titles) *"it is completely unnecessary to include tags in your question titles."* – petezurich Feb 25 '20 at 22:36
  • 1
    By the way: Your question seems off-topic and will likely get closed. From the [help center](https://stackoverflow.com/help/on-topic): *"Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it."* – petezurich Feb 25 '20 at 22:40
  • 1
    @petezurich Thanks for the clarification and tips. After some more work, I've updated my question. – YTD Feb 26 '20 at 15:32
  • Cool! Makes sense. All the best and fingers crossed for your project. – petezurich Feb 26 '20 at 18:00
  • Have you figured out a solution @YTD? I have the same problem. – Blue482 Sep 02 '20 at 02:01
  • @Blue482 Unfortunately, I pretty quickly gave up on this approach because I discovered that Julia has a MixedModels module that is virtually identical to lme4 in R and it was easy enough to use that instead. I'd still like to know if there's a good module for mixed models in python... – YTD Sep 02 '20 at 17:07
  • 4
    Maybe Bambi or Pymer4 – Blue482 Sep 02 '20 at 17:14
  • Pymer4 looks like it might be exactly what I wanted. I'll have to check it out sometime. – YTD Sep 02 '20 at 18:27
  • I've tried to specify in `from_formula` logit link function, but I don't know how to use it. IMHO documentation to this method is poor and with no enough examples provided. – ipj Mar 05 '21 at 16:39
  • 1
    I'm a Bambi developer. Now you can work with mixed models much similar to what you would do in lme4, but using Bayesian statistics and in Python. Have a look at the repo https://github.com/bambinos/bambi, and feel free to open an issue if you need help. – Tomas Capretto Apr 09 '21 at 14:13

1 Answers1

6

Just had to do something similar with Python, as suggested in the comments Pymer4 appeared to offer a suitable approach (especially if you are familiar with R anyway). Using the example dataset 'titanic' referred to in the question:

from pymer4.models import Lmer

model = Lmer("Survived  ~ Age  + (1|Pclass)",
             data=titanic, family = 'binomial')

print(model.fit())

OUT:

Formula: Survived~Age+(1|Pclass)

Family: binomial     Inference: parametric

Number of observations: 887  Groups: {'Pclass': 3.0}

Log-likelihood: -525.812     AIC: 1057.624

Random effects:

               Name    Var    Std
Pclass  (Intercept)  0.856  0.925

No random effect correlations specified

Fixed effects:

             Estimate  2.5_ci  97.5_ci     SE     OR  OR_2.5_ci  OR_97.5_ci  \
(Intercept)     0.962  -0.162    2.086  0.573  2.616       0.85       8.050   
Age            -0.039  -0.051   -0.026  0.006  0.962       0.95       0.974   

              Prob  Prob_2.5_ci  Prob_97.5_ci  Z-stat  P-val  Sig  
(Intercept)  0.723        0.460         0.889   1.677  0.093    .  
Age          0.490        0.487         0.493  -6.200  0.000  *** 

Just as an additional comment (sorry for diverting from the main question), I ran this on an Ubuntu 20.04 machine with Python 3.8.8. Not sure if anyone else encountered this problem, but when running the model above with Pymer4 the package threw an error (Same error when I tried to reproduce a similar model form the Pymer4 documentation here):

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This can be fixed by updating the pymer4 version (see also this related question):

conda install -c ejolly -c conda-forge -c defaults pymer4=0.7.8
mitja
  • 61
  • 8
Hal Koons
  • 141
  • 2
  • 10
  • Using `design_matrix.any()` is likely wrong, the intended meaning of `if design_matrix:` is either checking if it is `None` (replace with `if design_matrix is not None:`) or checking that it is not an empty array (`if design_matrix.size > 0:`). – Giacomo Petrillo Jan 30 '22 at 12:30