1

I created a function to calculate the parameters of a logarithm-function.

My aim is to predict the future results of data points that follow a logarithm function. But what is the most important is that my algorithm fits the last results better than the whole data points as it is the prediction that matters. I currently use Mean Squared Error to optimize my parameters but I do not know how to weight it such as it takes my most recent data points as more important than the first ones.

  • Here is my equation:

y = C * log( a * x + b )

  • Here is my code:

    import numpy as np
    from sklearn.metrics import mean_squared_error
    
    def approximate_log_function(x, y):
    
        C = np.arange(0.01, 1, step = 0.01)
        a = np.arange(0.01, 1, step = 0.01)
        b = np.arange(0.01, 1, step = 0.01)
    
        min_mse = 9999999999
        parameters = [0, 0, 0]
    
        for i in np.array(np.meshgrid(C, a, b)).T.reshape(-1, 3):
    
            y_estimation = i[0] * np.log(i[1] * np.array(x) + i[2])  
            mse = mean_squared_error(y, y_estimation)
    
            if mse < min_mse:
                min_mse = mse
                parameters = [i[0], i[1], i[2]]
    
    return (min_mse, parameters)
    

You can see in the image below the orange curve is the data I have and the blue line is my fitted line. We see that the line stretch a bit away from the line on the end and I would like to avoid that to improve the prediction from my function.

logarithm function graph

My question is twofold:

  • Is this actually the best way to do it or is it best to use another function (such as the increasing form of an Exponential Decay)? (y = C ( 1 - e-kt ), k > 0)

  • How can I change my code so that the last values are more important to be fitted than the first ones.

Jordan Delbar
  • 180
  • 2
  • 9
  • You can use the `sample_weight` parameter of [`mean_squared_error`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) to give a different weight to each example. – jdehesa Feb 06 '19 at 17:12
  • 2
    Not a pro of maths here, but I do know that curve fitting is best done by using numpy's `polyfit` function. As far as I understand it, you perform a log curve fitting by transforming your `x` values into `log(x)` and then perform a simple linear polyfit. The advantage is that `polyfit` has a weighting factor to put emphasis on larger values. This may be exactly what you are looking for. Further reading: https://stackoverflow.com/questions/3433486/how-to-do-exponential-and-logarithmic-curve-fitting-in-python-i-found-only-poly – offeltoffel Feb 06 '19 at 17:12
  • Thanks, I have pretty good results with this function : a = numpy.polyfit(numpy.log(x), y, 1) and my calculation is now much faster. – Jordan Delbar Feb 07 '19 at 12:38

1 Answers1

0

Usually, in non-linear least-squares, the inverse of the y values is taken as weight, that essentially eliminates outliers, you can expand on that idea by adding a function to calculate the weight based on the x position.

def xWeightA(x):
    container=[]
    for k in range(len(x)):
        if k<int(0.9*len(x)):
           container.append(1)
        else:
            container.append(1.2)
   return container

def approximate_log_function(x, y):

    C = np.arange(0.01, 1, step = 0.01)
    a = np.arange(0.01, 1, step = 0.01)
    b = np.arange(0.01, 1, step = 0.01)

    min_mse = 9999999999
    parameters = [0, 0, 0]
    LocalWeight=xWeightA(x)

    for i in np.array(np.meshgrid(C, a, b)).T.reshape(-1, 3):

        y_estimation = LocalWeight*i[0] * np.log(i[1] * np.array(x) + i[2])  
        mse = mean_squared_error(y, y_estimation)

        if mse < min_mse:
            min_mse = mse
            parameters = [i[0], i[1], i[2]]

    return (min_mse, parameters)

Also, it looks like you're evaluating through the complete objective function, that makes the code to take to much time to find the minimum (at least on my machine). You can use curve_fit or polyfit as suggested, but if the goal is to generate the optimizer try adding an early break or a random search through the grid. Hope it helps

TavoGLC
  • 889
  • 11
  • 14