3

In a regression framework, suppose we have two independent variables x1 and x2 and we want different slopes depending on x1>0 or x1<0, and same with x2. This sort of model is used in the computation of the dual beta, if you need an entry point to the literature.

This topic has been presented at crossvalidated site (Link), so now I am trying to code it. My first attemp is using statsmodels which is a classic linear regression model:

import numpy as np
import statsmodels.api as sm

spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

# Fit and summarize OLS model
mod = sm.OLS(spector_data.endog, spector_data.exog)

res = mod.fit()
print(res.summary())

==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.4639      0.162      2.864      0.008       0.132       0.796
x2             0.0105      0.019      0.539      0.594      -0.029       0.050
x3             0.3786      0.139      2.720      0.011       0.093       0.664
const         -1.4980      0.524     -2.859      0.008      -2.571      -0.425
==============================================================================

How would be possible to implement the positive and negative effect assuming it is asymetric so we want to quantify it?(dual beta coeffcient)

As an expected format output we would have something like (fictitious values for the sake of exemplification):

==============================================================================
              coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1+            0.1031      0.162      2.864      0.008       0.132       0.796
x1-            0.4639      0.162      2.864      0.008       0.132       0.796
x2+            0.0111      0.019      0.539      0.594      -0.029       0.050
x2-            0.212       0.019      0.539      0.594      -0.029       0.050
x3             0.3786      0.139      2.720      0.011       0.093       0.664
const         -1.4980      0.524     -2.859      0.008      -2.571      -0.425
==============================================================================
PeCaDe
  • 277
  • 1
  • 8
  • 33
  • AFAIR, one way to do this is to include both [x1, x1 * (x1>0)] as regressors, then the first coefficient is the negative part, the second is the difference between positive and negative. i.e. interaction of x1 with a dummy variable for `x1>0`, or add [x1 * (x1<0). x1* (x1>0] as regressors to have the second coefficient be for the positive part directly – Josef Oct 15 '22 at 14:25
  • HI @Josef, thanks for your comment, it seems to clarify the thread in a directional way but even being a simple topic, as you mention, it has several ways to be approached. Also, this is the first time this topic has been addressed on the site. Therefore, it would be good to generate an answer with its exemplification? – PeCaDe Oct 15 '22 at 17:08
  • @Josef. I think mostly understand you comment. But What if we have only positive values? that means the effect is symmetric? because I saw results of dual-beta in datasets where is always X>0. Still thinking on the implementation as I am not pretty sure on the notation/implementation. – PeCaDe Oct 17 '22 at 14:48
  • If you don't have negative values, then there is no information in the data about a separate negative effect. Extrapolating to negative effect requires assumption about what would be the slope there, assuming the same slope as for the positive effect is one possible assumption. – Josef Oct 17 '22 at 17:47

1 Answers1

1

From research, there are at least two posibilities.

  • Split variables regarding X>0|X<0 which is related with the provided link in the topic:

    df["GPA+"] = (df["GPA"] >= 0) * df["GPA"]

    df["GPA-"] = (df["GPA"] < 0) * df["GPA"]

  • When having time attribute, the dual beta can be considered from the increment/decrement of the variable trought the time, this is about differencing the columns so the estimate can be computed for both concepts.

    df["diff_GPA"] = df["GPA"].diff(period=1)

    df["diff_GPA+"] = (df["diff_GPA"] >= 0) * df["GPA"]

    df["diff_GPA-"] = (df["diff_GPA"] < 0) * df["GPA"]

In both cases depending on the nature of the dataset a dual beta can be computed with this feature engineering step. Som the estimates can be interpreted with the OLS.

mikele
  • 196
  • 3