I have two variables (x and y) that have a somewhat sigmoidal relationship with each other, and I need to find some sort of prediction equation that will enable me to predict the value of y, given any value of x. My prediction equation needs to show the somewhat sigmoidal relationship between the two variables. Therefore, I cannot settle for a linear regression equation that produces a line. I need to see the gradual, curvilinear change in slope that occurs at both the right and left of the graph of the two variables.
I started using numpy.polyfit after googling curvilinear regression and python, but that gave me the awful results you can see if you run the code below. Can anyone show me how to re-write the code below to get the type of sigmoidal regression equation that I want?
If you run the code below, you can see that it gives a downward facing parabola, which is not what the relationship between my variables should look like. Instead, there should be more of a sigmoidal relationship between my two variables, but with a tight fit with the data that I am using in the code below. The data in the code below are means from a large-sample research study, so they pack more statistical power than their five data points might suggest. I do not have the actual data from the large-sample research study, but I do have the means below and their standard deviations(which I am not showing). I would prefer to just plot a simple function with the mean data listed below, but the code could get more complex if complexity would offer substantial improvements.
How can I change my code to show a best fit of a sigmoidal function, preferably using scipy, numpy, and python? Here is the current version of my code, which needs to be fixed:
import numpy as np
import matplotlib.pyplot as plt
# Create numpy data arrays
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])
# Use polyfit and poly1d to create the regression equation
z = np.polyfit(x, y, 3)
p = np.poly1d(z)
xp = np.linspace(100, 1600, 1500)
pxp=p(xp)
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(140,310)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
EDIT BELOW: (Re-framed the question)
Your response, and its speed, are very impressive. Thank you, unutbu. But, in order to produce more valid results, I need to re-frame my data values. This means re-casting x values as a percentage of the max x value, while re-casting y values as a percentage of the x-values in the original data. I tried to do this with your code, and came up with the following:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
# Create numpy data arrays
'''
# Comment out original data
#x = np.array([821,576,473,377,326])
#y = np.array([255,235,208,166,157])
'''
# Re-calculate x values as a percentage of the first (maximum)
# original x value above
x = np.array([1.000,0.702,0.576,0.459,0.397])
# Recalculate y values as a percentage of their respective x values
# from original data above
y = np.array([0.311,0.408,0.440,0.440,0.482])
def sigmoid(p,x):
x0,y0,c,k=p
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
p_guess=(600,200,100,0.01)
(p,
cov,
infodict,
mesg,
ier)=scipy.optimize.leastsq(residuals,p_guess,args=(x,y),full_output=1,warning=True)
'''
# comment out original xp to allow for better scaling of
# new values
#xp = np.linspace(100, 1600, 1500)
'''
xp = np.linspace(0, 1.1, 1100)
pxp=sigmoid(p,xp)
x0,y0,c,k=p
print('''\
x0 = {x0}
y0 = {y0}
c = {c}
k = {k}
'''.format(x0=x0,y0=y0,c=c,k=k))
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
Can you show me how to fix this revised code?
NOTE: By re-casting the data, I have essentially rotated the 2d (x,y) sigmoid about the z-axis by 180 degrees. Also, the 1.000 is not really a maximum of the x values. Instead, 1.000 is a mean of the range of values from different test participants in a maximum test condition.
SECOND EDIT BELOW:
Thank you, ubuntu. I carefully read through your code and looked aspects of it up in the scipy documentation. Since your name seems to pop up as a writer of the scipy documentation, I am hoping you can answer the following questions:
1.) Does leastsq() call residuals(), which then returns the difference between the input y-vector and the y-vector returned by the sigmoid() function? If so, how does it account for the difference in the lengths of the input y-vector and the y-vector returned by the sigmoid() function?
2.) It looks like I can call leastsq() for any math equation, as long as I access that math equation through a residuals function, which in turn calls the math function. Is this true?
3.) Also, I notice that p_guess has the same number of elements as p. Does this mean that the four elements of p_guess correspond in order, respectively, with the values returned by x0,y0,c, and k?
4.) Is the p that is sent as an argument to the residuals() and sigmoid() functions the same p that will be output by leastsq(), and the leastsq() function is using that p internally before returning it?
5.) Can p and p_guess have any number of elements, depending on the complexity of the equation being used as a model, as long as the number of elements in p is equal to the number of elements in p_guess?