I am trying to convert an Excel model to Python. The model is set up as follows:
1 - there are 3 columns A, B, C which contain the input variables (column F contains the target variable). We would like to learn the coefficients in a logistic function for these the values in columns A, B, and C.
2 - those columns are combined via a logistic function and the output of that is in column D.
3 - column E then holds the Z score of column D.
4 - Finally, column E does a linear transformation on the Z score from column D.
This is set up in Excel via the solver addin, which can find the coefficients for A, B, C that result in column E having the smallest sum of squared errors with column F. I would like to know how you set up such a thing in Python/scikit-learn/Statsmodels/some other python package?
The series of equations essentially looks like this:
1 / (1 + e^-(A + B + C)) -> X
z_score(X)*CONSTANT1 + CONSTANT2 -> estimate
CONSTANT1
and CONSTANT2
are known in advance.
The goal is to minimize the squared error between estimate
and y.
I have run regressions in scikit before where I simply call model.fit(X, y)
where X
is some sequence of input variables and y
is a target output variable. But the sequence of steps in the Excel model don't seem to fit into a simple model.fit
call. The code I wish I could write in Python/Scikit to accomplish this would look something like: model.fit(z_score(logistic(A + B + C))*CONSTANT1 + CONSTANT2, y)
but I don't think this sort of thing is legal in any of the Python packages I know about (since scikit wants the first parameter to model.fit
to be numeric.
Is what I'm trying to do possible? Also, What is this type of regression analysis called? I don't even know what to google to get more information on this.