1

I want to update a list of objects in python as fast as possible. What I am doing right now will be demonstrated in the following code:

from bokeh.models.sources import ColumnDataSource
from random import randint

n = 10
m = 50
sources = []
for i in range(n):
    # all list elements have similar structure
    sources.append(ColumnDataSource(data=dict(x=range(m), y=range(m), count=[0])))

def some_function():
    # do some computation
    return [randint(0, m) for i in xrange(n)]

def update():
    # this function is called every 20ms
    for s in sources:
        s.data = dict(x=some_function(), y=s.data['y'], count=[s.data['count'][0]+1])

The for loop of my update() function takes too long. I have a lot of lists to update and the function is called every 20 ms. Sometimes the update() function takes more than 20ms to execute.

From my current research I know that list comprehensions are much faster than for loops but I cannot use them in my case, can I? Like:

#not working code
sources = [dict(x=.., y=.., count=..) for s.data in sources]
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
jofroe
  • 431
  • 1
  • 5
  • 14
  • 1
    List comprehensions are faster, but we're talking about _microseconds_ at most for most cases; they won't fix 20 _millisecond_ delays (and they'd have no effect at all on the speed of `update`, only the initial constructions of `sources`). What are you doing in `some_function`? That's the only thing you're not showing which could reasonably be taking 20ms or more. – ShadowRanger Oct 31 '16 at 14:33
  • 2
    I'd put this as an answer, but I don't think it's complete or useful on it's own. Profile it. Yes, for loops are slow, but that's not going to be what's slowing you down in most cases. For examples of how to profile, see http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script. – user2699 Oct 31 '16 at 14:36
  • It's also worth mentioning. The purpose of a Bokeh server is to keep a python process in sync with a browser view of the app, across the network, automatically. Setting `s.data = ...` triggers *network communication* to update the browser's view of the data source. It's not clear from your question whether you've included or exclude the time for the network updates in your 20ms estimate. – bigreddot Oct 31 '16 at 15:30
  • @ShadowRanger ok, that's good to know. I do different things, e.g. slicing of a 6-dimensional tensor or generating an image etc. I just thought that I maybe could speed up the looping process, but now I see that it plays a minor role. @bigreddot I measure the execution time with `time.clock()` in my update calback – jofroe Nov 07 '16 at 15:56

3 Answers3

0

Not sure it will be faster but you can.

sources = [ColumnDataSource(data=dict(x=some_function(), y=s.data['y'], count=[s.data['count'][0] + 1]) for s in sources]

It will not work if you have to keep the same objects though.

S. de Melo
  • 786
  • 4
  • 11
0

You can use list comprehension for both the initial for loop and the update function.

Initial for loop:

sources = [ColumnDataSource(data=dict(x=range(m), y=range(m), count=[0]) for i in range(n)]

Update loop:

s_updated = [ColumnDataSource(data=dict(x=some_function(), y=s.data['y'], count=[s.data['count'][0]+1])) for s in sources]
Greg
  • 1,845
  • 2
  • 16
  • 26
  • Should be `timeit` ed to be sure. The overhead of creating new `ColumnDataSource` might be slower than the poster's current for loop. – Guillaume Oct 31 '16 at 14:54
  • No, that unfortunately, doesn't work in my case. I have to update the existing objects, otherwise the javascript part (that uses the ColumnDataSource) wouldn't get the changes. – jofroe Nov 07 '16 at 15:38
0

Initialising a dict with the {'key': 'value, ...} notation is faster than using dict(), so I'd use that:

timeit.timeit('{"a": 1, "b": 2}', number=1000000)
0.1645284985700215

timeit.timeit('dict(a=1, b=2)', number=1000000)
0.4730025877568096

That gives:

def update():
    # this function is called every 20ms
    for s in sources:
        s.data = {'x': some_function(), 'y': s.data['y'], 'count': [s.data['count'][0]+1]}

And BTW why is this "count" a list ? an integer should be enough.

Guillaume
  • 5,497
  • 3
  • 24
  • 42
  • While calling the `dict` constructor is slower (generalized `LOAD_GLOBAL` + `CALL_FUNCTION` is much slower than optimized `BUILD_MAP`, relatively speaking), unless `sources` is huge, this won't be meaningful. The total cost of using `dict` in your microbenchmark works out to just under half a microsecond per usage, vs. 1/6 of a microsecond with syntax. Sure, it's roughly triple the cost of using dict literals, but it's still trivial; for the 50 values in `sources`, the overhead of the `dict` constructor is ~16 microseconds, less than 0.1% of the 20 ms interval. – ShadowRanger Nov 02 '16 at 00:58
  • Avoiding the dict constructor turns out to be a little bit slower than the code with constructor calls. 'count' is a list because bokeh's ColumDataSource assumes each entry of the dict having same length - so it crashes if you don't use lists. – jofroe Nov 07 '16 at 15:35