I created a function to calculate the parameters of a logarithm-function.
My aim is to predict the future results of data points that follow a logarithm function. But what is the most important is that my algorithm fits the last results better than the whole data points as it is the prediction that matters. I currently use Mean Squared Error to optimize my parameters but I do not know how to weight it such as it takes my most recent data points as more important than the first ones.
- Here is my equation:
y = C * log( a * x + b )
Here is my code:
import numpy as np from sklearn.metrics import mean_squared_error def approximate_log_function(x, y): C = np.arange(0.01, 1, step = 0.01) a = np.arange(0.01, 1, step = 0.01) b = np.arange(0.01, 1, step = 0.01) min_mse = 9999999999 parameters = [0, 0, 0] for i in np.array(np.meshgrid(C, a, b)).T.reshape(-1, 3): y_estimation = i[0] * np.log(i[1] * np.array(x) + i[2]) mse = mean_squared_error(y, y_estimation) if mse < min_mse: min_mse = mse parameters = [i[0], i[1], i[2]] return (min_mse, parameters)
You can see in the image below the orange curve is the data I have and the blue line is my fitted line. We see that the line stretch a bit away from the line on the end and I would like to avoid that to improve the prediction from my function.
My question is twofold:
Is this actually the best way to do it or is it best to use another function (such as the increasing form of an Exponential Decay)? (y = C ( 1 - e-kt ), k > 0)
How can I change my code so that the last values are more important to be fitted than the first ones.