2

How do i calculate the pearsons corr and Beta coefficient of every column in my dataframe against a dependent var

A             B            C              D           Sales
1             0            1              1             10
0             0            1              1             9 
1             1            1              0             15

Here A~D are independent and Sales is dependent, i want to find the r and beta coefficient of every column(Attribute)

2 Answers2

1

Use this for correlation:

df.corr()['Sales'][:-1]

or if your dataframe is too large, perhaps it is more efficient to do this:

df[df.columns[:-1]].apply(lambda x:x.corr(df['Sales']))

output:

A    0.628619
B    0.987829
C         NaN
D   -0.987829

And inspired by this answer for beta:

def beta(df):
    # first column is the Sales
    X = df.values[:, [-1]]
    # prepend a column of ones for the intercept
    X = np.concatenate([np.ones_like(X), X], axis=1)
    # matrix algebra
    b = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(df.values[:, :-1])
    return pd.Series(b[1], df.columns[:-1], name='Beta')

print(beta(df))

output:

A    1.129032e-01
B    1.774194e-01
C    1.110223e-15
D   -1.774194e-01

EXPLANATION:

You choose last column Sales as X, add a column of 1s to X to work as intercept and use this closed form answer to calculate beta for all columns with Sales and finally return the betas as a Pandas Series indexed by column names.

Ehsan
  • 12,072
  • 2
  • 20
  • 33
0

You should check the docs for numpy, pandas, and scikit-learn, each of which has functions to get these.

For correlations, there's numpy.corrcoef in numpy and pd.Series.corr in pandas.

For regression coefficients, check out statsmodels or sklearn.linear_model in scikit-learn.

Elliott Collins
  • 660
  • 5
  • 8