4

[Original Question]

I need an equation for curve that increases indefinitely as time elapsed increases based on data below. How to get that?

[Updates on the question]

I need to specify proper parameters for scipy.interpolate.splrep. Can someone helpout?

Also, Is there a way to get an equation from coeffs of b spline?

[Alternate question]

How to get a fit using decomposition of signal in Fourier series?

This plot seems to be a combination of linear plot, a periodic function pf1 which increases four time, a larger periodic function which causes pf1 to happen again and again indefinitely. The difficulty of the plot is the reason which is why this question has been asked.

enter image description here

Data:

Time elapsed in sec.    TX + RX Packets
(0,0)
(10,2422)
(20,2902)
(30,2945)
(40,3059)
(50,3097)
(60,4332)
(70,4622)
(80,4708)
(90,4808)
(100,4841)
(110,6081)
(120,6333)
(130,6461)
(140,6561)
(150,6585)
(160,7673)
(170,8091)
(180,8210)
(190,8291)
(200,8338)
(210,8357)
(220,8357)
(230,8414)
(240,8414)
(250,8414)
(260,8414)
(270,8414)
(280,8414)
(290,8471)
(300,8471)
(310,8471)
(320,8471)
(330,8471)
(340,8471)
(350,8471)
(360,8471)
(370,8471)
(380,8471)
(390,8471)
(400,8471)
(410,8471)
(420,8528)
(430,8528)
(440,8528)
(450,8528)
(460,8528)
(470,8528)
(480,8528)
(490,8528)
(500,8528)
(510,9858)
(520,10029)
(530,10129)
(540,10224)
(550,10267)
(560,11440)
(570,11773)
(580,11868)
(590,11968)
(600,12039)
(610,13141)

My Code:

import numpy as np
import matplotlib.pyplot as plt

points = np.array(
    [(0,0), (10,2422), (20,2902), (30,2945), (40,3059), (50,3097), (60,4332), (70,4622), (80,4708), (90,4808), (100,4841), (110,6081), (120,6333), (130,6461), (140,6561), (150,6585), (160,7673), (170,8091), (180,8210), (190,8291), (200,8338), (210,8357), (220,8357), (230,8414), (240,8414), (250,8414), (260,8414), (270,8414), (280,8414), (290,8471), (300,8471), (310,8471), (320,8471), (330,8471), (340,8471), (350,8471), (360,8471), (370,8471), (380,8471), (390,8471), (400,8471), (410,8471), (420,8528), (430,8528), (440,8528), (450,8528), (460,8528), (470,8528), (480,8528), (490,8528), (500,8528), (510,9858), (520,10029), (530,10129), (540,10224), (550,10267), (560,11440), (570,11773), (580,11868), (590,11968), (600,12039), (610,13141)]
    )
# get x and y vectors
x = points[:,0]
y = points[:,1]

# calculate polynomial
z = np.polyfit(x, y, 3)
print z
f = np.poly1d(z)

# calculate new x's and y's
x_new = np.linspace(x[0], x[-1], 50)
y_new = f(x_new)

plt.plot(x,y,'o', x_new, y_new)
plt.xlim([x[0]-1, x[-1] + 1 ])
plt.show()

My Output:

enter image description here

My Code 2:

import numpy as N
from scipy.interpolate import splprep, splev

x = N.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610])
y = N.array([0, 2422, 2902, 2945, 3059, 3097, 4332, 4622, 4708, 4808, 4841, 6081, 6333, 6461, 6561, 6585, 7673, 8091, 8210, 8291, 8338, 8357, 8357, 8414, 8414, 8414, 8414, 8414, 8414, 8471, 8471, 8471, 8471, 8471, 8471, 8471, 8471, 8471, 8471, 8471, 8471, 8471, 8528, 8528, 8528, 8528, 8528, 8528, 8528, 8528, 8528, 9858, 10029, 10129, 10224, 10267, 11440, 11773, 11868, 11968, 12039, 13141])

# spline parameters
s=1.0 # smoothness parameter
k=3 # spline order
nest=-1 # estimate of number of knots needed (-1 = maximal)

# find the knot points
tckp,u = splprep([x,y],s=s,k=k,nest=nest,quiet=True,per=1)

# evaluate spline, including interpolated points
xnew,ynew = splev(N.linspace(0,1,400),tckp)

import pylab as P
data,=P.plot(x,y,'bo-',label='data')
fit,=P.plot(xnew,ynew,'r-',label='fit')
P.legend()
P.xlabel('x')
P.ylabel('y')

P.show()

My Output 2: enter image description here

sinhayash
  • 2,693
  • 4
  • 19
  • 51
  • Maybe https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=python+fitting+equation+to+data ? [ask] – boardrider Jun 19 '15 at 10:33
  • 1
    This is not a simple curve fitting, which I have explained why. I have updated what I have tried. – sinhayash Jun 19 '15 at 10:53
  • The third result of the suggested google search is http://stackoverflow.com/questions/5762446/python-find-a-best-fit-function-for-a-list-of-data, and the sixth is http://exnumerus.blogspot.co.il/2010/03/regression-curve-fitting-in-python-pt-1.html: aren't these what you look for? – boardrider Jun 19 '15 at 11:07
  • result 3 is similar to what i have tried. 6th also shows polynomial fits. I need probably a higher degree fit, and some period component (sin/cos) which I'm not sure. http://stats.stackexchange.com/questions/59423/regression-model-for-periodic-data can help, but as mentioned I am no mathematics expert. Please help with some code examples – sinhayash Jun 19 '15 at 11:27
  • Just do a decomposition of your signal in [Fourier series](https://stackoverflow.com/q/4258106/1791279) (not Fourier transform) then reconstruct the signal with only the few initial coefficients, this will give you a reasonable periodic fit. – rth Jun 19 '15 at 11:41
  • Can you give a better example rth ? – sinhayash Jun 19 '15 at 11:48
  • 1
    Actually, it might be simpler to just use [`scipy.interpolate.splrep`](http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.interpolate.splrep.html) with `per=1` . – rth Jun 19 '15 at 15:08

1 Answers1

2

It looks like you have a reaction kinetic:

#%%
import numpy as np
from scipy.integrate import odeint
from scipy import optimize
from matplotlib import pyplot as plt
#%%
data = []
with open('data.txt', 'r') as f:
    for line in f:
        data.append(line.strip(' \n ()').split(','))
data = np.array(data,dtype=float)
data = data[0:-1].T

#%%
slope = np.diff(data[1])
index = np.where(slope>1000)
index = np.append(index, len(data[0]) -1 )
plt.plot(data[0],data[1],'.')
plt.plot(data[0,index],data[1,index],'ro')
plt.plot(data[0,1:],np.diff(data[1]))

enter image description here

From here in I assume that the reaction starts at every marked spot (red). I am sure the code could be written much cleaner, but it is a first quick and dirty hack. You could use scipy curvefit or similar to fit the raction constant k

#%%
time = data[0,index]

def model(y,t,k):
    dydt = np.zeros(2)
    dydt[0] = -k*y[0]
    dydt[1] = k*y[0]

    return dydt


def res(k):
    y_hat = []
    t_hat = []
    for i in xrange(len(index) -1):
        '''
        I assume that at every timepoint the reaction is initiated by
        adding y[i + 1] - y[i] new datapackages. Over time they are 
        converted to sent packages. All packages which do not react,
        do not take part in the next cycle.
        '''
        y0 = [data[1, index[i+1]] - data[1, index[i]], 0]
        t0 = data[0, index[i]:index[i+1]]
        y_int,info = odeint(model, y0, t0, args=(k,), full_output = 1 )
        # I am not very happy about the construct below, but could
        # not find a better solution.
        y_hat.append(list(y_int[:,1]))
        t_hat.append(list(t0))
    return y_hat,t_hat

k = 2e-1
y,t = res(k)
''' It may be possible to play with y0[1] in the model in order 
to avoid the construct below. But since I started every reaction at y[1]=0
I have to add the last y value from the last step. This is a bit of a hack,
since data[1, index[i]] is not necessarily the corresponding solution. But hey, It seems to work.
'''
y_hat = [subitem + data[1, index[i]] for i,item in enumerate(y) for subitem in item]
t_hat = [subitem for item in t for subitem in item]
y_hat = np.array(y_hat,dtype=float)
t_hat = np.array(t_hat,dtype=float)


#%%
plt.plot(data[0],data[1],'.')
plt.plot(data[0,index],data[1,index],'ro')
plt.plot(t_hat,y_hat,'go')

enter image description here

Another approach could be (maybe physically more correct) to add a CDF of a gaussian peak at each timepoint.

Moritz
  • 5,130
  • 10
  • 40
  • 81