Error when trying to get the summary of a regression

Question

I was trying to run a Ridge Regression, just like this:

from sklearn.linear_model import LinearRegression, RidgeCV, Ridge
from regressors import stats

alphas = np.linspace(.00001, 100, 1000)

rr_scaled = RidgeCV(alphas= alphas, cv=5, normalize=True)

rr_scaled.fit(X_train, y_train)

It works fine, so I went to get the summary:

stats.summary(rr_scaled, X_train, y_train)

But I keep falling into this error:

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 10

What's that? Is anything wrong with the syntax?

I found this post: p-values from ridge regression in python but it's exactly like what I was doing. And in the post is working!

First thing that comes to mind is that structure of your training arrays is invalid for `stats.summary`. Also I checked [documentation](https://regressors.readthedocs.io/en/latest/modules.html#regressors.stats.summary) and it's stated that first arg of the function should be `linear model`. I'm not sure that it's the same as ridge regression model, just pay attention to it. — oo00oo00oo00, Jun 05 '20 at 12:15

score 1 · Answer 1 · answered Jun 05 '20 at 14:06

The problem seems to be that regressors expects your data to be in a particular shape. In particular, it seems to expect your target variable to be an array, instead of a matrix.

Consider the following example, which is based on your code:

import numpy as np
import pandas as pd
from regressors import stats
from sklearn.linear_model import LinearRegression, RidgeCV, Ridge

n_features = 3
n_samples = 10
X_train = np.random.normal(0, 1, size=(n_samples, n_features))
y_train = np.random.randn(n_samples)

alphas = np.linspace(.00001, 100, 1000)

rr_scaled = RidgeCV(alphas=alphas, cv=5, normalize=True)
rr_scaled.fit(X_train, y_train)

stats.summary(rr_scaled, X_train, y_train)

If I run it, it executes fine and outputs

Residuals:
    Min      1Q  Median    3Q     Max
-2.5431 -0.8815 -0.0059  0.69  2.2218


Coefficients:
            Estimate  Std. Error  t value   p value
_intercept  0.213519    0.463767   0.4604  0.656149
x1          0.001617    0.761174   0.0021  0.998351
x2          0.006398    0.895701   0.0071  0.994457
x3         -0.003119    0.518982  -0.0060  0.995335
---
R-squared:  0.00267,    Adjusted R-squared:  -0.49599
F-statistic: 0.01 on 3 features

Now, if I change the target to a "matrix" shape:

y_train = np.random.randn(n_samples).reshape((-1, 1))

I get the same error you got:

Traceback (most recent call last):
  File "a.py", line 16, in <module>
    stats.summary(rr_scaled, X_train, y_train)
  File "lib/python3.8/site-packages/regressors/stats.py", line 252, in summary
    coef_df['Estimate'] = np.concatenate(
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 3

So, in your particular case, you probably need to do this:

y_train = y_train.reshape((-1,))

Error when trying to get the summary of a regression

1 Answers1