3

I have a dataset that looks like the one below (first 5 rows shown). CPA is an observed result from an experiment (treatment) on different advertising flights. Flights are hierarchically grouped in campaigns.

  campaign_uid  flight_uid treatment         CPA
0   0C2o4hHDSN  0FBU5oULvg   control  -50.757370
1   0C2o4hHDSN  0FhOqhtsl9   control   10.963426
2   0C2o4hHDSN  0FwPGelRRX   exposed  -72.868952
3   0C5F8ZNKxc  0F0bYuxlmR   control   13.356081
4   0C5F8ZNKxc  0F2ESwZY22   control  141.030900
5   0C5F8ZNKxc  0F5rfAOVuO   exposed   11.200450

I fit a model like the following one:

model.fit('CPA ~ treatment',  random=['1|campaign_uid'])

To my knowledge, this model simply says:

  • We have a slope for treatment
  • We have a global intercept
  • We also have an intercept per campaign

so one would just get one posterior for each such variable.

However, looking at the results below, I also get posteriors for the following variable: 1|campaign_uid_offset. What does it represent?

enter image description here

Code for fitting the model and the plot:

model   = Model(df)
results = model.fit('{} ~ treatment'.format(metric),  
                    random=['1|campaign_uid'], 
                    samples=1000)
# Plotting the result
pm.traceplot(model.backend.trace)
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

1 Answers1

6
  • 1|campaign_uid

These are the random intercepts for campaigns that you mentioned in your list of parameters.

  • 1|campaign_uid_sd

This is the standard deviation of the aforementioned random campaign intercepts.

  • CPA_sd

This is the residual standard deviation. That is, your model can be written (in part) as CPA_ij ~ Normal(b0 + b1*treatment_ij + u_j, sigma^2), and CPA_sd represents the parameter sigma.

  • 1|campaign_uid_offset

This is an alternative parameterization of the random intercepts. bambi uses this transformation internally in order to improve the MCMC sampling efficiency. Normally this transformed parameter is hidden from the user by default; that is, if you make the traceplot using results.plot() rather than pm.traceplot(model.backend.trace) then these terms are hidden unless you specify transformed=True (it's False by default). It's also hidden by default from the results.summary() output. For more information about this transformation, see this nice blog post by Thomas Wiecki.

Jake Westfall
  • 651
  • 8
  • 16
  • Thanks. With this in mind, how do I get a sense of the fitting error? (e.g. for comparing models). I always thought `CPA_sd` would give me this information (i.e. models with *higher residual standard deviation fit worse*, so I could use `CPA_sd` to compare models). Is that realistic to assume? And if that's the case, wouldn't doing so _ignore_ the contribution to the "error" captured by the standard deviation of the per-campaign intercept (i.e. `1|campaign_uid_sd`)? – Amelio Vazquez-Reina Apr 09 '17 at 13:10
  • Also Jake, what does the `T` in `treatment[T.exposed]` mean in the output? – Josh Apr 13 '17 at 21:10
  • @AmelioVazquez-Reina If you're interested in doing model comparison, you may find [this page from the pymc3 docs](https://pymc-devs.github.io/pymc3/notebooks/GLM-model-selection.html) useful. – Jake Westfall Apr 14 '17 at 17:17
  • @Josh That's added by [patsy](https://patsy.readthedocs.io/en/latest/), which is what we use to parse the formulae that users enter. The "T" stands for "treatment coding", a.k.a. dummy coding, which is the default coding scheme used by patsy. For examples of using other coding schemes, see our [shooter notebook](https://github.com/bambinos/bambi/blob/master/examples/shooter_crossed_random_ANOVA.ipynb). – Jake Westfall Apr 14 '17 at 17:19