1

sorry I am still fairly new to python and hoping someone can help me with a curve fitting issue...

I have a MxN dataframe "data" where M is number of samples and N is number of variables (10). I have a Mx1 dependent variable "Fractions" which are a fraction between 0 and 1 that correspond to each sample.

I know how to easily run a multiple linear regression between the independent variables N and and dependent variable Fractions, however, I wrapped the regression in a sigmoid regression to keep responses between 0 and 1.

I was able to perform this like so...

def sigmoid(x,b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10):
    y = 1 / (1 + np.exp(- (b0 + b1*x[:,0] + b2*x[:,1] + b3*x[:,2] + b4*x[:,3] + b5*x[:,4] +
                       b6 * x[:,5] + b7*x[:,6] + b8*x[:,7] + b9*x[:,8] + b10*x[:,9])))
    return y

popt, pcov = curve_fit(sigmoid,data,fractions)

'''use coefficients from curve_fit to estimate new fraction from new 1xN data'''
newFraction = sigmoid(newData, *popt)

However, I would like to implement some sort of stepwise multiple regression for feature selection, preferably based on AIC. I have found these following methods using python...

http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html https://planspace.org/20150423-forward_selection_with_statsmodels/ https://datascience.stackexchange.com/questions/24405/how-to-do-stepwise-regression-using-sklearn/24447#24447

But all of these methods rely on using a regression that involves a .fit() method. Is there a way to implement a model like above using a any of the .fit() methods like statsmodels or lmfit? I have also looked into Lasso-type methods, however, again can't figure out how to implement my function.

Jkravz
  • 13
  • 2
  • so in other words, are you trying to do feature selection with logistic regression? – Yohanes Gultom Aug 08 '18 at 00:57
  • Yes in a way, however its not technically a logistic regression because my responses are not binary. I am trying to predict a response between 0 and 1, the model is essentially a multilinear regression, just wrapped in a sigmoid, if that makes sense. – Jkravz Aug 08 '18 at 08:16
  • Ah, yes, my bad. I missed the "between" part and forgot what sigmoid is. If you want to use RFE, I think you can try to wrap your existing model in an scikit-learn estimator class https://stackoverflow.com/questions/51679173/using-sklearn-rfe-with-an-estimator-from-another-package/51686174#51686174 – Yohanes Gultom Aug 08 '18 at 08:30
  • Thanks, I will take a look and get back.. – Jkravz Aug 08 '18 at 09:00

1 Answers1

0

I think you should be able to do something like this with lmfit (incomplete because your example is incomplete):

from lmfit import Model
def sigmoid(x,b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10):
    y = 1 / (1 + np.exp(- (b0 + b1*x[:,0] + b2*x[:,1] + b3*x[:,2] + b4*x[:,3] + b5*x[:,4] +
                       b6 * x[:,5] + b7*x[:,6] + b8*x[:,7] + b9*x[:,8] + b10*x[:,9])))
    return y

smodel = Model(sigmoid)
params = smodel.make_params(b0=1, b1=0, b2=0.1, b3=0.01, b4=0.01, b5=0.01, 
                            b6=0.01, b7=0.01, b8=0.01, b9=0.01, b10=0.01)

result = smodel.fit(data, params, x=x)

That will fit by minimizing chi-square.

I believe that for any particular fit (for which there will be a fixed number of data points and a fixed number of variables) that minimizing chi-square will also minimize AIC (as chi-square is positive definite, and AIC = 2*Nvarys + Ndata * log(chi_square / Ndata)

M Newville
  • 7,486
  • 2
  • 16
  • 29
  • Thank you for your response. This does work, and i retrieve the same model coefficients as I do with curve_fit. Do you have any recommendations on now how to use the AIC in one of the stepwise methods listed above? – Jkravz Aug 08 '18 at 09:00
  • Well, you could certainly loop through the Parameters, making each one fixed or variable and redo the fit, then compare the AIC value to determine if that parameter is needed to explain the data. I'm sure that could be abused, but that seems to be a typical concern for stepwise regression. – M Newville Aug 09 '18 at 01:45