How to structure regression where input is transformed in Python/Scikit Learn

Question

I am trying to convert an Excel model to Python. The model is set up as follows:

1 - there are 3 columns A, B, C which contain the input variables (column F contains the target variable). We would like to learn the coefficients in a logistic function for these the values in columns A, B, and C.

2 - those columns are combined via a logistic function and the output of that is in column D.

3 - column E then holds the Z score of column D.

4 - Finally, column E does a linear transformation on the Z score from column D.

This is set up in Excel via the solver addin, which can find the coefficients for A, B, C that result in column E having the smallest sum of squared errors with column F. I would like to know how you set up such a thing in Python/scikit-learn/Statsmodels/some other python package?

The series of equations essentially looks like this:

1 / (1 + e^-(A + B + C)) -> X
z_score(X)*CONSTANT1 + CONSTANT2 -> estimate

CONSTANT1 and CONSTANT2 are known in advance.

The goal is to minimize the squared error between estimate and y.

I have run regressions in scikit before where I simply call model.fit(X, y) where X is some sequence of input variables and y is a target output variable. But the sequence of steps in the Excel model don't seem to fit into a simple model.fit call. The code I wish I could write in Python/Scikit to accomplish this would look something like: model.fit(z_score(logistic(A + B + C))*CONSTANT1 + CONSTANT2, y) but I don't think this sort of thing is legal in any of the Python packages I know about (since scikit wants the first parameter to model.fit to be numeric.

Is what I'm trying to do possible? Also, What is this type of regression analysis called? I don't even know what to google to get more information on this.

score 0 · Answer 1 · answered Jan 21 '20 at 09:57

If you want to do this in python + sk-learn, you should use the regular way of fitting/predicting. The LogisticRegression class has attributes which you can return, for instance LogisticRegression.coef_ which returns the coefficients of the features in the decision function.

AFAIK, sk-learn is unable to return z_scores as it does not really support statistical inference (Since it is a machine learning package, not a statistical package). If you really need the statistical elements of a Logistic Regression you might be better off by using Statsmodels. Check out This question for more info on that.

Hope that helps a bit

The z_score is being used here as an input to another equation. The whole thing is set up like this: logistic_function(A + B + C) produces X. Z Score of X * constant 1 + constant 2 = estimate. What I want is to be able to set up a regression that will find the best values of A, B, and C to minimize the error between estimate and target. I think I'm just going to edit my question w/this to make it more clear. — Josh Reback, Jan 21 '20 at 18:47

How to structure regression where input is transformed in Python/Scikit Learn

1 Answers1