0

I'm trying to add two optional arguments to a function that trains a GLM using the statsmodel package. I used this question to guide the development of the function: How do I create a Python function with optional arguments?

Basically, I want to give the user the ability to use OR not use weights and offsets.

This is the function:

def model_train(df, formula, *args, **kwargs):
    '''
    run non discrete model
    df = model set
    formula = model formula
    weight = column used for weights
    offset = column used for offsets
    '''
    weight = kwargs.get(df[weight], None)
    print(f"Weights initialized....Starting to intialize offsets")

    offset_factor = kwargs.get(df[offset], None)
    #print(f"Offset initialized....starting matrix development")

    y, x = patsy.dmatrices(formula, df, return_type = 'dataframe')
    print(f"Matrix done...starting to instantiate model")

    glm = sm.GLM(y, x, family = sm.families.Poisson(), var_weights = weight, offset = offset_factor)
    print(f"Model instantiated....starting to fit")

    glm_results = glm.fit()
    print("Model fit. If you are reading this, you're done.  Run 'model_object'[0].summary() to get summary statistics")

    return glm_results, x, y

This is the error it throws:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-34-0ce97f02e15e> in <module>
----> 1 model_80150 = model_train(df = train_model1, formula=formula_80150, weight = 'eunit', offset = None)

~\Documents\GitHub\Edit\run_model.py in model_train(df, formula, *args, **kwargs)
      7     offset = column used for offsets
      8     '''
----> 9     weight = kwargs.get(df[weight], None)
     10     print(f"Weights initialized....Starting to intialize offsets")
     11 

UnboundLocalError: local variable 'weight' referenced before assignment

EDIT UPDATE:

I've tried the following with a TypeError: unsupported operand type(s) for &: 'NoneType' and 'str' error

def model_train(df, formula, *args, **kwargs):
    '''
    run non discrete model
    df = model set
    formula = model formula
    weight = column used for weights
    offset = column used for offsets
    '''


    weight_value = kwargs.get('weight', None)
    print(f"Weights initialized....Starting to intialize offsets")

    offset_factor = kwargs.get('offset', None)
    print(f"Offset initialized....starting matrix development")

    y, x = patsy.dmatrices(formula, df, return_type = 'dataframe')
    print(f"Matrix done...starting to instantiate model")

    if weight_value == None:
        glm = sm.GLM(y, x, family = sm.families.Poisson())

    elif weight_value == None & offset_factor != None:
        glm = sm.GLM(y, x, family = sm.families.Poisson(), offset = df[offset_factor])

    elif weight_value != None and offset_factor == None:
        glm = sm.GLM(y, x, family = sm.families.Poisson(), var_weights = df[weight_value])

    else:
        glm = sm.GLM(y, x, family = sm.families.Poisson(), var_weights = df[weight_value], offset = df[offset_factor])
    print(f"Model instantiated....starting to fit")

    glm_results = glm.fit()
    print("Model fit. If you are reading this, you're done.  Run 'model_object'[0].summary() to get summary statistics")

    return glm_results, x, y
Jordan
  • 1,415
  • 3
  • 18
  • 44
  • `df[weight]`'s weight is not defined, you'll need to pass it somehow – Raymond Reddington Mar 09 '20 at 12:45
  • 2
    `df[weight]` tries to access `weight` before you assign to `weight = ...`… You don't define `weight` beforehand. If it's supposed to be passed as a `kwarg`, you need to access it as `kwargs['weight']`. – deceze Mar 09 '20 at 12:45
  • 1
    It should be `weight = kwargs.get('weight', None)` – Guy Mar 09 '20 at 12:49
  • @Guy, The weight and offset argument is a string that is a column name from the dataframe. How does the argument `kwargs.get('weight', None)` know that? I now get `IndexError: tuple index out of range` using the syntax you suggested. – Jordan Mar 09 '20 at 12:55
  • Did you perhaps mean `df['weight']`? (note the string) – GPhilo Mar 09 '20 at 12:56
  • 1
    @GPhilo no, the call is in the traceback: `weight = 'eunit'` – wjandrea Mar 09 '20 at 12:57
  • 1
    @Jordan You need to get the value of `weight` parameter from `kwargs` *before* you use it to get the data from the df. You have two actions here. `weight` is just a name, you can call it whatever you want. – Guy Mar 09 '20 at 12:58
  • So I changed `glm = sm.GLM(y, x, family = sm.families.Poisson(), var_weights = weight, offset = offset_factor)` to `glm = sm.GLM(y, x, family = sm.families.Poisson(), var_weights = df[weight], offset = df[offset_factor])` It works when I have arguments but doesn't when it's `none`. It throws this error `KeyError: None` – Jordan Mar 09 '20 at 13:03
  • 2
    You want `and`, not `&`…!? – deceze Mar 09 '20 at 13:29
  • Use `is`, not `==`, when comparing with `None`. – chepner Mar 09 '20 at 13:34
  • @deceze, that's it! – Jordan Mar 09 '20 at 13:34

0 Answers0