5

I am trying to understand the results of Mixed Linear Models provided by Python statsmodel package. I want to avoid pitfalls in my data analysis and interpretation. The questions are after the data loading/output code block.

Loading data and fitting model:

import statsmodels.api as sm
import statsmodels.formula.api as smf
data = sm.datasets.get_rdataset("dietox", "geepack").data
md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])
mdf = md.fit()
print mdf.summary()

Mixed Linear Model Regression Results
========================================================
Model:            MixedLM Dependent Variable: Weight    
No. Observations: 861     Method:             REML      
No. Groups:       72      Scale:              11.3669   
Min. group size:  11      Likelihood:         -2404.7753
Max. group size:  12      Converged:          Yes       
Mean group size:  12.0                                  
--------------------------------------------------------
             Coef.  Std.Err.    z    P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept    15.724    0.788  19.952 0.000 14.179 17.268
Time          6.943    0.033 207.939 0.000  6.877  7.008
Group Var    40.394    2.149                            
========================================================

Q1. (a) What exactly is Group Var coefficient (params)? I thought it is the variance of Group Var (cov_params), but the default output does not match the in-built method output.

Q1. (b) What does "Group Var" parameter (params) means?

print "-----Parameters-----"
print mdf.params
print
print "-----Covariance matrix-----"
print mdf.cov_params()

-----Parameters-----
Intercept    15.723523
Time          6.942505
Group Var     3.553634
dtype: float64

-----Covariance matrix-----
           Intercept      Time  Group Var
Intercept   0.621028 -0.007222   0.000052
Time       -0.007222  0.001115  -0.000012
Group Var   0.000052 -0.000012   0.406197

Q2. (a) What does standard error (bse) for Group Var mean? Why is Group Var estimate not reported in default output? Is it not important?

Q2. (b) How is is different from standard error for variance (bse_re)?

print "-----Standard errors-----"
print mdf.bse
print
print "-----Standard errors of random effects-----"
print mdf.bse_re

-----Standard errors-----
Intercept    0.788053
Time         0.033387
Group Var    0.637336
dtype: float64

-----Standard errors of random effects-----
Group Var    2.148771
dtype: float64

Q3. Why are t-value and p-value not reported for random parameters in summary()?

print "-----t-values (or z-values?)-----"
print mdf.tvalues
print
print "-----p-values-----"
print mdf.pvalues

-----t-values (or z-values?)-----
Intercept     19.952366
Time         207.938608
Group Var      5.575760
dtype: float64

-----p-values-----
Intercept    1.429597e-88
Time         0.000000e+00
Group Var    2.464519e-08
dtype: float64

Reference: https://www.statsmodels.org/dev/mixed_linear.html

0 Answers0