3

I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this):

import statsmodels.formula.api as smf

def wrapper(formula, data, **kwargs):
    return smf.logit(formula, data).fit(**kwargs)

If I give this function to a user, who then attempts to define his/her own function:

def square(x):
    return x**2

model = wrapper('y ~ x + square(x)', data=df)

they will receive a NameError because the patsy module is looking in the namespace of wrapper for the function square. Is there a safe, Pythonic way to handle this situation without knowing a priori what the function names are or how many functions will be needed?

FYI: This is for Python 3.4.3.

chriswhite
  • 1,370
  • 10
  • 21
  • I don't know the details (too much magic for my taste), but the `statsmodels.base.model.Model.from_formula` docstring describes a `eval_env` kwd in **kwarg which you might be able to increment by 1. `from_formula` is inherited by all or most models. – Josef Apr 22 '16 at 18:31
  • Yea, I did try that; didn't seem to work but maybe I didn't call it correctly. – chriswhite Apr 22 '16 at 18:38
  • Did you try setting it to 3? In a similar case I was using try..except wrapping to figure out which depth user functions are in. – Josef Apr 22 '16 at 19:02
  • example: `statsmodels.basedata.ModelData.__setstate__` which tries to recreate the formula and design during unpickling. I wrote that by trial and error based on a few examples. – Josef Apr 22 '16 at 19:12
  • @user333700 post this as an answer and I'll accept it; two notes: 1.) I had to set `eval_env = 2` and 2.) this is a keyword to `logit(..)` not to `fit(...)`. (Not that you were implying it was, but I didn't realize that). – chriswhite Apr 22 '16 at 19:26
  • `smf.logit` is an alias of `sm.Logit.from_formula`, that's why I was referring to `from_formula`. – Josef Apr 22 '16 at 21:15
  • Yea, definitely; I just wanted to clarify that the `eval_env` keyword was not for the `fit` method but for the initial formula call (whether via the API or not). – chriswhite Apr 22 '16 at 21:28

2 Answers2

2

statsmodels uses the patsy package to parse the formulas and create the design matrix. patsy allows user functions as part of formulas and obtains or evaluates the user function in the user namespace or environment.

as reference see eval_env keyword in http://patsy.readthedocs.org/en/latest/API-reference.html

from_formula is the method of models that implements the formula interface to patsy. It use eval_env to provide the necessary information to patsy, which by default is the calling environment of the user. This can be overwritten by the user with the corresponding keyword argument.

The simplest way to define the eval_env is as an integer that indicates the stacklevel that patsy should use. from_formula is incrementing it to take account of the additional level in the statsmodels methods.

According to the comments, eval_env = 2 will use the next higher level from the level that creates the model, e.g. with model = smf.logit(..., eval_env=2).

This creates the model, calls patsy and creates the design matrix, model.fit() will estimate it and returns the results instance.

Josef
  • 21,998
  • 3
  • 54
  • 67
  • if there are several wrappers that are nested would it make sense to take this as an argument and pass it on after incrementing it. like `def f(..., eval_env=1): ... smf.logit(..., eval_env=eval_env+1)` – Tadhg McDonald-Jensen Apr 22 '16 at 22:16
  • If I understand your comment correctly, then that is what from_formula is doing, https://github.com/statsmodels/statsmodels/blob/master/statsmodels/base/model.py#L138 e.g. increment if provided and hand on. Overall, I would check security issues before expanding a lot on `eval` approaches. – Josef Apr 22 '16 at 23:01
1

if you are willing to use eval to do the heavy lifting of your function you can construct a namespace from the arguments to wrapper and the local variables to the outer frame:

wrapper_code = compile("smf.logit(formula, data).fit(**kwargs)",
                       "<WrapperFunction>","eval")
def wrapper(formula,data,**kwargs):
    outer_frame = sys._getframe(1)
    namespace = dict(outer_frame.f_locals)
    namespace.update(formula=formula, data=data, kwargs=kwargs, smf=smf)
    return eval(wrapper_code,namespace)

I don't really see this as a cheat since it seems to be what logit is doing anyway for it to raise a NameError, and as long as wrapper_code is not modified and there are no name conflicts (like using something called data) this should do what you want.

Tadhg McDonald-Jensen
  • 20,699
  • 5
  • 35
  • 59
  • Oh this is super interesting and something I can think about going forward if this gets more complicated. Thanks! – chriswhite Apr 22 '16 at 21:31
  • One issue, that I don't know how it will work in cases like this, is when we (model or results) need the same information again, e.g. for evaluating or transforming the explanatory variables in `predict`. patsy keeps the information around and uses some `eval` magic, AFAIK. – Josef Apr 22 '16 at 23:15