0

I have a time series that looks like:

timeseries1 = [{'price': 250, 'time': 1.52},
    {'price': 251, 'time': 3.65},
    {'price': 253, 'time': 10.1},
    {'price': 254, 'time': 10.99}]

I want to be able to interpolate this data so that it moves forward in small timesteps, and have something like:

timeStep = 0.1
timeseries2 = [{'price': 250, 'time': 1.5},
    {'price': 250, 'time': 1.6},
    {'price': 250, 'time': 1.7},
    ...
    {'price': 250, 'time': 3.6},
    {'price': 251, 'time': 3.7},
    {'price': 251, 'time': 3.8},
    {'price': 251, 'time': 3.9},
    ...
    {'price': 251, 'time': 10.0},
    {'price': 253, 'time': 10.1},
    {'price': 253, 'time': 10.2},
    {'price': 253, 'time': 10.3},
    ...
    {'price': 253, 'time': 10.9},
    {'price': 254, 'time': 11.0}]

I'm really unsure of how to do this efficiently and hope there will be a nice pythonic way to do so. What I've tried doing is iterating through timeseries1, with a while loop to append new values to the end of timeseries2, but this seems very inefficient having 2 nested loops.

Edit: Here is the code/algorithm currently being used to do this.

startTime = math.floor(timeseries1[0]['time'] / timeStep) * timeStep
oldPrice = timeseries1[0]['price']
timeseries3 = []
timeseries3.append(timeseries1[0])
timeseries3[0]['time'] = startTime
for x in timeseries1[1:]:
    while startTime < x['time']:
        timeseries3.append({'price': oldPrice, 'time': startTime})
        startTime += timeStep
    oldPrice = x['price']

So that timeseries3 will be the same as timeseries2 in the end.

Jolta
  • 2,620
  • 1
  • 29
  • 42
mojo1mojo2
  • 1,062
  • 1
  • 18
  • 28
  • 1
    Do you mean interpolate? Also, for work like that, pandas is a good option. Take a look at http://stackoverflow.com/questions/40161155/linearly-interpolating-pandas-time-series – pvg Mar 29 '17 at 01:41
  • Sorry yes, interpolate! I'll have a look now. – mojo1mojo2 Mar 29 '17 at 01:47
  • That solution looks very nice, but how do you forward fill when interpolating (instead of using linear). Looking at the documentation for interpolate, there is no option for it. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html – mojo1mojo2 Mar 29 '17 at 03:57
  • Have you searched for it? Seems like a separate question. Pandas is a fairly big topic bug covered extensively here on SO (and elsewhere on the web). If what you're after is working with data (rather than writing basic data analysis and manipulation from scratch), you almost certainly want to learn and use pandas. When and if you run into other specific problems search or ask here. – pvg Mar 29 '17 at 04:04
  • I have searched for it, and it (to me) seems related to my current question because I would like to use df.interpolate(step_function/ffill) to solve the problem. I will continue trying to figure it out myself and if I get stuck I can always post another question :) – mojo1mojo2 Mar 29 '17 at 04:14

2 Answers2

1

Try to use RedBlackPy. RedBlackPy.Series class built on red-black trees for convenient work with time series, it has interpolation methods which built into getitem operator(Series[key]).

import redblackpy as rb

time = [1.52, 3.65, 10.1, 10.99]
price = [250, 251, 253, 254]
# create Series with 'floor' interpolation 
# your case, in time t you need last known value
series = rb.Series( index=time, values=price, dtype='float64',
                    interpolate='floor' )
# now you can access at any key with no insertion using interpolation
# and can create new series with necessary time step
# args in uniform method: (start, end, step)
new_series = series.uniform(1.5, 11, 0.1)
# required result!
print(new_series)

Output of the last print is following (with problems of float arithmetic):

Series object Untitled
1.5: 0.0
1.6: 250.0
1.7000000000000002: 250.0
1.8000000000000003: 250.0
1.9000000000000004: 250.0
2.0000000000000004: 250.0
2.1000000000000005: 250.0
...
9.89999999999998: 251.0
9.99999999999998: 251.0
10.09999999999998: 251.0
10.19999999999998: 253.0
10.29999999999998: 253.0
10.399999999999979: 253.0
10.499999999999979: 253.0
10.599999999999978: 253.0
10.699999999999978: 253.0
10.799999999999978: 253.0
10.899999999999977: 253.0
10.999999999999977: 254.0

Remind, using interpolation you have access at any key! You don't have to create new series if you just want to iterate over it with uniform time step. You can do it with RedBlackPy.Series with no additional memory:

 import redblackpy as rb

 # create iterator for time
 def grid_generator(start, stop, step):

     it = start - step

     while it <= stop:
         it += step
         yield it

  time = [1.52, 3.65, 10.1, 10.99]
  price = [250, 251, 253, 254]
  # create Series with 'floor' interpolation 
  # your case, in time t you need last known value
  series = rb.Series( index=time, values=price, dtype='float64',
                      interpolate='floor' )

  # ok, now we iterate over our Series (with 4 elements!)
  for key in grid_generator(1.6, 11, 0.1):
      print(series[key]) # prints last known value (your case)
0

...hope there will be a nice pythonic way to do so.

Here's a pythonic way of generating a list: using a generator! However, I must admit that the following code has issues:

def timeseries( t1, t2, p1, coeff, step ):
  t = t1
  while t <= t2:
    yield { 'price' :  int( p1 + ( t - t1 ) * coeff), 'time' : t }
    t += step


print list(timeseries( 1.5, 11 , 250 , 0.43 , 0.1 ) )

So, the generator might be a "fun" way to create your time series. However, it needs work due to the floating arithmetic problems I'm seeing when I run it:

[{'price': 250, 'time': 1.5}, {'price': 250, 'time': 1.6}, {'price': 250, 'time': 1.7000000000000002}, {'price': 250, 'time': 1.8000000000000003}, {'price': 250, 'time': 1.9000000000000004}, {'price': 250, 'time': 2.0000000000000004}, {'price': 250, 'time': 2.1000000000000005}, {'price': 250, 'time': 2.2000000000000006}, {'price': 250, 't...

While I think that the above code is easy to read ( well, the variable names could have been more descriptive and perhaps maybe a comment or two would have been nice ) here is an even tighter piece of python code that accomplishes the same thing. Instead of declaring a generator function, it uses an anonymous generator to accomplish the same thing.

For completeness, I've added a line to figure out the slope of the data to perform the interpolation.

(t1,p1,t2,p2) = ( 1.52 , 250.0 , 10.99, 254.0 ) 
coeff = ( p2 - p1) / ( t2  - t1 ) 
print  list( { 'time' : i/10.0, 'price' :  int (i/10.0*coeff * 100 ) / 100   + p1  } for i in range(int( t1 * 10 ) , int( t2 * 10 )))

The code could be generalized even further. The 10.0 and 100 values are in there to perform integer math and keep only the significant digits that we care about. This is cleaner than the previous code that had the time value get very wonky just by adding the step of 0.1 to the current time t ( t += step ). This site talks about using an frange generator built on decimal.Decimal. In my 2.7 python environment, I couldn't get that to work properly, so I just hard coded the scale/significant digits into the formula ( again, not very general ).

Community
  • 1
  • 1
Mark
  • 4,249
  • 1
  • 18
  • 27