6

I have the following points

0 4194304
1 497420
2 76230
3 17220
4 3595
5 1697
6 491
7 184
8 54
9 15
10 4
11 4
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1

If I plot them with a log scale on the y-axis they look roughly linear. How can I fit a straight line to this log scale so I can fit the data?

My current code is very crude. For each x,y pair I do.

xcoords.append(x)
ycoords.append(math.log(y))

And then at the end I do

plt.plot(xcoords,ycoords)
plt.show()
marshall
  • 2,443
  • 7
  • 25
  • 45
  • IIRC, this is called _logarithmic regression_, or something like that. – rodrigo Dec 10 '13 at 10:44
  • 2
    This may be relevant: http://stackoverflow.com/questions/3433486/how-to-do-exponential-and-logarithmic-curve-fitting-in-python-i-found-only-poly – James Mills Dec 10 '13 at 11:10

3 Answers3

1

This solution uses the least squares fitting method from numpy (docs).

This page provides an example usage of linear regression, on linear data.

Because you have log-linear data, then here we transform the data first, then run a linear fit.

import numpy as np
import matplotlib.pyplot as plt

d = '''
0 4194304
1 497420
 ... (put all the rest of the data in here)
'''

D = np.loadtxt(d.split('\n'))

x = D[:,0]
y = D[:,1]
y_ln = np.log(y)

n = D.shape[0]

A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(y_ln[0:n])

X = np.linalg.lstsq(A,B)[0]
a=X[0]; b=X[1]

# so now your fitted line is log(y) = a*x + b
# lets show it on a graph.
plt.figure()
plt.plot(x, a*x+b, '--')
plt.plot(x, y_ln, 'o')
plt.ylabel('log y')
plt.xlabel('x values')
plt.show()

# or use the original scales by transforming the data back again:

plt.figure()
plt.plot(x, np.exp(a*x+b), '--')
plt.plot(x, y, 'o')
plt.ylabel('y')
plt.xlabel('x values')
plt.yscale('log')
plt.show()

fitting all the data

However, your data seems to have two regimes, so a single linear fit doesn't well capture the data. You could instead describe it as two distinct regimes, which may or may not be appropriate depending on where your data comes from and whether you can explain the point at which the two regimes change.

So lets take the first part of your data and just fit that

n = 13
A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(yl[0:n])
A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(y_ln[0:n])

X = np.linalg.lstsq(A,B)[0]
a=X[0]; b=X[1]

plt.figure()
plt.plot(x[0:n], np.exp(a*x[0:n]+b), '--')
plt.plot(x, y, 'o')
plt.ylabel('y')
plt.xlabel('x values')
plt.yscale('log')
plt.show()

fitting part of the data

This is a better fit to the first part of the data (but it may not be particularly meaningful -- that depends on what process generated the data points).

Bonlenfum
  • 19,101
  • 2
  • 53
  • 56
1

Rather than changing the data you can just plot it on a semi-log plot. So for example you could do:

    import matplotlib.pyplot as plt

    xArray = range(22)
    yArray = [4194304,497420,76230,17220,3595,1697,491,184,54,15,4,4,1,1,1,1,1,1,1,1,1,1]

    plt.semilogy(xArray,yArray)
    plt.show

As for doing the code fitting - try the following:

    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    from numpy import square

    xArray = range(22)
    yArray = [4194304,497420,76230,17220,3595,1697,491,
              184,54,15,4,4,1,1,1,1,1,1,1,1,1,1]

    def f(x,a,b,c):
        return a*(square(x))+(b*x)+c

    popt, pcov = curve_fit(f, xArray, yArray)

    fittedA = popt[0]
    fittedB = popt[1]
    fittedC = popt[1]

    yFitted = f(xArray,fittedA,fittedB,fittedC)

    plt.figure()
    plt.semilogy(xArray,yFitted)
    plt.show

You will need to come up with a better fitting function then the quadratic I have used in the f() function to get a good fit but this should do what you need.

Tommy
  • 622
  • 5
  • 8
0

You may try cut zeros ( x[0:12] ), generate function interpolation from (x[0:12],log_y[0:12]), generate bigger linear space in same range x is 12 items new_x is 50 items in space range (not item index) 0,x[11], and plot with f(new_x), as follows:

>>> x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
>>> y
[4194304, 497420, 76230, 17220, 3595, 1697, 491, 184, 54, 15, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>> log_y
[15.249237972318797, 13.117190018630332, 11.24151036498232, 9.753826777981722, 8.187299270155147, 7.436617265234227, 6.19644412779452, 5.214935757608986, 3.9889840465642745, 2.70805020110221, 1.3862943611198906, 1.3862943611198906, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
>>> f2=interp1d(x[0:12],log_y[0:12],kind='cubic')
>>> x_new_fit=np.linspace(0,x[11],50)
>>> plt.plot(x_new_fit,f2(x_new_fit),'-')
[<matplotlib.lines.Line2D object at 0x3a6e950>]
>>> plt.show()

experiment with different kinds of interpolation, to achieve different kind smoothness

>>> 
>>> f1=interp1d(x[0:12],log_y[0:12],kind='quadratic')>>> plt.plot(x[0:12],log_y[0:12],'-',x_new_fit,f2(x_new_fit),'-',x_new_fit,f1(x_new_fit),'--')
[<matplotlib.lines.Line2D object at 0x3a97dd0>, <matplotlib.lines.Line2D object at 0x3a682d0>, <matplotlib.lines.Line2D object at 0x3a687d0>]
>>> plt.show()
>>> 
hardcode
  • 53
  • 1
  • 7