10

I am struggling to implement a linear regression in pymc3 with a custom likelihood.

I previously posted this question on CrossValidated & it was recommended to post here as the question is more code orientated (closed post here)

Suppose you have two independent variables x1, x2 and a target variable y, as well as an indicator variable called delta.

  • When delta is 0, the likelihood function is standard least squares
  • When delta is 1, the likelihood function is the least squares contribution only when the target variable is greater than the prediction

enter image description here

Example snippet of observed data:

x_1  x_2     observed_target  
10    1   0   100              
20    2   0   50               
5    -1   1   200             
10   -2   1   100             

Does anyone know how this can be implemented in pymc3? As a starting point...

model =  pm.Model()
with model as ttf_model:

  intercept = pm.Normal('param_intercept', mu=0, sd=5)
  beta_0 = pm.Normal('param_x1', mu=0, sd=5)
  beta_1 = pm.Normal('param_x2', mu=0, sd=5)
  std = pm.HalfNormal('param_std', beta = 0.5)

  x_1 = pm.Data('var_x1', df['x1'])
  x_2 = pm.Data('var_x2', df['x2'])

  mu = (intercept + beta_0*x_0 + beta_1*x_1)
user1635327
  • 1,469
  • 3
  • 11
Mike Tauber
  • 137
  • 1
  • 12
  • I think the `switch` function will do that. Here is one example usage: https://discourse.pymc.io/t/sum-distributions-based-on-another-distribution/7976/2 – OriolAbril Sep 02 '21 at 17:44
  • Hmm - are you able to give an example of how the switch function can be used with a custom likelihood? – Mike Tauber Sep 06 '21 at 13:46
  • 2
    Silly question: how is delta any different than having a third independant variable x_3? Could you not get an accurate prediction using 3 independant variables? – Shabbir Hussain Sep 09 '21 at 15:24

1 Answers1

2

In case this is helpful, from reading the docs it looks like something along these lines might work, but I have not been able to test it and it was too long to pop into a comment.

model =  pm.Model()
with model as ttf_model:

  intercept = pm.Normal('param_intercept', mu=0, sd=5)
  beta_0 = pm.Normal('param_x1', mu=0, sd=5)
  beta_1 = pm.Normal('param_x2', mu=0, sd=5)
  std = pm.HalfNormal('param_std', beta = 0.5)

  x_1 = pm.Data('var_x1', df['x1'])
  x_2 = pm.Data('var_x2', df['x2'])
  delta = pm.Data('delta', df['delta'])  # Or whatever this column is
  target = pm.Data('target', df['observed_target'])
  
  ypred = (intercept + beta_0*x_0 + beta_1*x_1)  # Intermediate result
  target_ge_ypred = pm.math.ge(target, ypred)  # Compare target to intermediate result
  zero = pm.math.constant(0)  # Use this if delta==1 and target<ypred
  
  # EDIT: Check delta
  alternate = pm.math.switch(target_ge_ypred, ypred, zero)  # Alternative result
  mu = pm.math.switch(pm.math.eq(delta, zero), ypred, alternate)  # Actual result wanted?
TMBailey
  • 557
  • 3
  • 14
  • I wonder if it will be helpful to define `ypred` using `pm.Deterministic`. Also, instead of equating `alternate` to `zero` maybe it should be set to `target` (when `target_ge_ypred`); this might achieve the desired effect. – TMBailey Oct 31 '21 at 08:32