I am trying to understand the results of Mixed Linear Models provided by Python statsmodel package. I want to avoid pitfalls in my data analysis and interpretation. The questions are after the data loading/output code block.
Loading data and fitting model:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data = sm.datasets.get_rdataset("dietox", "geepack").data
md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])
mdf = md.fit()
print mdf.summary()
Mixed Linear Model Regression Results
========================================================
Model: MixedLM Dependent Variable: Weight
No. Observations: 861 Method: REML
No. Groups: 72 Scale: 11.3669
Min. group size: 11 Likelihood: -2404.7753
Max. group size: 12 Converged: Yes
Mean group size: 12.0
--------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept 15.724 0.788 19.952 0.000 14.179 17.268
Time 6.943 0.033 207.939 0.000 6.877 7.008
Group Var 40.394 2.149
========================================================
Q1. (a) What exactly is Group Var coefficient (params)? I thought it is the variance of Group Var (cov_params), but the default output does not match the in-built method output.
Q1. (b) What does "Group Var" parameter (params) means?
print "-----Parameters-----"
print mdf.params
print
print "-----Covariance matrix-----"
print mdf.cov_params()
-----Parameters-----
Intercept 15.723523
Time 6.942505
Group Var 3.553634
dtype: float64
-----Covariance matrix-----
Intercept Time Group Var
Intercept 0.621028 -0.007222 0.000052
Time -0.007222 0.001115 -0.000012
Group Var 0.000052 -0.000012 0.406197
Q2. (a) What does standard error (bse) for Group Var mean? Why is Group Var estimate not reported in default output? Is it not important?
Q2. (b) How is is different from standard error for variance (bse_re)?
print "-----Standard errors-----"
print mdf.bse
print
print "-----Standard errors of random effects-----"
print mdf.bse_re
-----Standard errors-----
Intercept 0.788053
Time 0.033387
Group Var 0.637336
dtype: float64
-----Standard errors of random effects-----
Group Var 2.148771
dtype: float64
Q3. Why are t-value and p-value not reported for random parameters in summary()?
print "-----t-values (or z-values?)-----"
print mdf.tvalues
print
print "-----p-values-----"
print mdf.pvalues
-----t-values (or z-values?)-----
Intercept 19.952366
Time 207.938608
Group Var 5.575760
dtype: float64
-----p-values-----
Intercept 1.429597e-88
Time 0.000000e+00
Group Var 2.464519e-08
dtype: float64
Reference: https://www.statsmodels.org/dev/mixed_linear.html