I have a dataframe with independent variables in the column headers, and each rows is a seperate set of dependent variables:
5.032530 6.972868 8.888268 10.732009 12.879130 16.877655
0 2.512298 2.132748 1.890665 1.583538 1.582968 1.440091
1 5.628667 4.206962 4.179009 3.162677 3.132448 1.887631
2 3.177090 2.274014 2.412432 2.066641 1.845065 1.574748
3 5.060260 3.793109 3.129861 2.617136 2.703114 1.921615
4 4.153010 3.354411 2.706463 2.570981 2.020634 1.646298
I would like to fit a curve of type Y=A*x^B to each row. I need to solve for A & B for about ~5000 rows, 6 datapoints in each row. I was able to do this using np.apply, but it takes about 40 seconds to do this. Can I speed up using Cython or by vectorizing somehow? I need precision to about 4 decimals
Here is what i have:
import pandas as pd
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv(r'C:\File.csv')
def curvefita(y):
return curve_fit(lambda x,a,b: a*np.power(x,b), df.iloc[:,3:].columns, y,p0=[8.4,-.58], bounds=([0,-10],[200,10]),maxfev=2000)[0][0]
def curvefitb(y):
return curve_fit(lambda x,a,b: a*np.power(x,b), df.iloc[:,3:].columns, y,p0=[8.4,-.58], bounds=([0,-10],[200,10]),maxfev=2000)[0][1]
avalues = df.iloc[:,3:].apply(curvefita, axis=1)
bvalues = df.iloc[:,3:].apply(curvefitb, axis=1)
df['a']=avalues
df['b']=bvalues
colcount = len(df.columns)
#build power fit - make the matrix
powerfit = df.copy()
for column in range(colcount-2):
powerfit.iloc[:,column] = powerfit.iloc[:,colcount-2] * (powerfit.columns[column]**powerfit.iloc[:,colcount-1])
#graph an example
plt.plot(powerfit.iloc[0,:colcount-2],'r')
plt.plot(df.iloc[0,:colcount-2],'ro')
#another example looked up by ticker
plt.plot(powerfit.iloc[5,:colcount-2],'b')
plt.plot(df.iloc[5,:colcount-2],'bo')