Inconsistent Python Performance on Windows

Question

I have some Python 2.7 code I'm working on and it works great on any *nix type system. However, on windows the same section of code will have wildly different execution times. Note my debug output below. t is the total time for each pass, s is the time to generate the data and u is the time to send that data over serial to my device (all in milliseconds).

t: 9 - s: 3 - u: 6
t: 14 - s: 9 - u: 5
t: 9 - s: 3 - u: 6
t: 9 - s: 3 - u: 6
t: 15 - s: 8 - u: 7
t: 14 - s: 9 - u: 5
t: 11 - s: 5 - u: 6
t: 15 - s: 9 - u: 6
t: 14 - s: 9 - u: 5
t: 13 - s: 8 - u: 5
t: 15 - s: 9 - u: 6
t: 15 - s: 9 - u: 6
t: 14 - s: 8 - u: 6
t: 11 - s: 6 - u: 5
t: 11 - s: 5 - u: 6
t: 15 - s: 8 - u: 7
t: 15 - s: 10 - u: 5
t: 7 - s: 2 - u: 5
t: 15 - s: 9 - u: 6
t: 15 - s: 9 - u: 6
t: 13 - s: 7 - u: 6
t: 12 - s: 7 - u: 5
t: 12 - s: 6 - u: 6
t: 15 - s: 9 - u: 6
t: 8 - s: 2 - u: 6
t: 14 - s: 9 - u: 5
t: 15 - s: 9 - u: 6
t: 14 - s: 9 - u: 5
t: 15 - s: 9 - u: 6
t: 14 - s: 8 - u: 6
t: 14 - s: 9 - u: 5
t: 14 - s: 9 - u: 5
t: 9 - s: 4 - u: 5
t: 11 - s: 5 - u: 6

The serial send time is not a problem and generally very consistent. Is the "s" step, where it actually generates the data that is problematic, taking anywhere for 2 to 9 milliseconds. Quite a huge swing! On Debian (running on a raspberry pi even) this portion takes a very consistent 11-12 ms.

There's a lot of other code around this but the step denoted by the "s" time is basically this:

    buf = [wheel_helper(self._vector[y][x], h, s) for y in range(h) for x in range(w)]
    buf = [i for sub in buf for i in sub]
    self._led.setBuffer(buf)

It generates a rainbow pattern on a matrix where the color is based on its distance from the center. But that is all that it does every time. I see no reason why it should vary in time so wildly.

Any thoughts?

Update: You can generally ignore the code that I'm running for the "s" step. This is just one example of many, all of which run in a very variable amount of time. Some are using range, some aren't. It's all over the board, but it's always a problem.

Update 2:

Ok, I did some further testing and made an example that's very simple and doesn't use range! It calculates the first 1000 elements of the fibonacci sequence 1000 times. Pretty simple, right? But on Windows the difference between quickest run and slowest would be nearly 375% longer (Max/Min in example output below). All timing values are milliseconds.

import time
import math
min = 10
max = 0
avg = 0
sum = 0
count = 0
def msec():
    return time.clock() * 1000.0

def doTime(start):
    global min
    global max
    global avg
    global sum
    global count
    diff = msec() - start
    if diff < min: min = diff
    if diff > max: max = diff
    sum += diff
    avg = sum/count
    print "Curr {:.3f} | Min {:.3f} | Max {:.3f} | Max/Min {:.3f} | Avg {:.3f}".format(diff, min, max, max/min, avg)



h = 24
w = 24
while count < 1000:
    start = msec()

    #calculate the first 1000 numbers in the fibonacci sequence
    x = 0
    while x < 1000:
        a = int(((((1 + math.sqrt(5)) / 2) ** x) - (((1 - math.sqrt(5)) / 2) ** (x))) / math.sqrt(5))
        x+=1

    count+=1
    doTime(start)

Turns out that Mac isn't immune, but better at only 75% longer for the slowest run. I tried to run on Linux but it won't do microsecond timing resolution it would seem so the numbers were rounded badly to the nearest ms.

Windows: Curr 2.658 | Min 2.535 | Max 9.524 | Max/Min 3.757 | Avg 3.156

Mac: Curr 1.590 | Min 1.470 | Max 2.577 | Max/Min 1.753 | Avg 1.554

May be a problem with "range"? Have you tried the same code piece with "xrange"? — Syed Mauze Rehan, Jul 15 '14 at 04:53
In Python 2.X, when you're iterating through a range, use `xrange`. In 2.X `range` creates a new `list` object, while `xrange` returns a `generator`. In Python 3.X `xrange` replaces `range`. — Benproductions1, Jul 15 '14 at 05:13
Note updated question... it happens with other code to. that was just one example. wheel_helper just gets a color based on a position from 0 to 384 — Adam Haile, Jul 15 '14 at 11:22
After more testing, I don't think this is just windows... http://stackoverflow.com/questions/24762030/ There was too much history with this... would've had to just throw the whole thing away since it's all moot. — Adam Haile, Jul 15 '14 at 15:23

score 0 · Answer 1 · answered Jul 15 '14 at 05:44

0

I believe that it is due to range. Older version of python (2.X) has both range and xrange for this purpose. But xrange is efficient than range since xrange generates the list only when it is needed (i.e, it returns a generator) while range generates the entire list(hence inefficient, suppose you have a list from 1 to 100000)

So if your list size is increasing, the use of range could be the reason for your process being slowed down.

answered Jul 15 '14 at 05:44

Syed Mauze Rehan

1,125
14
31

note updated question above. it happens even when I do't use range. – Adam Haile Jul 15 '14 at 11:22
Also... why would range over xrange make the time it takes to do the same thing change? I could see it just being slower, but varying in execution time by a factor of 5??? – Adam Haile Jul 15 '14 at 12:53

score 0 · Answer 2 · answered Jul 15 '14 at 05:52

0

I don't see anything problematic. You have code that performs serial IO, which will block your process and possibly invoke the scheduler. If that happens, it will first give control to a different process and only when that one yield or exceeds its timeslice it will re-schedule your process.

The question for you is: What is the size of the scheduler timeslice of the systems you are running? I believe that this will give you an insight into what is happening.

answered Jul 15 '14 at 05:52

Ulrich Eckhardt

16,572
3
28
55

As he is mentioning in the question, "Is the "s" step, where it actually generates the data that is problematic, taking anywhere for 2 to 9 milliseconds" Don't you think "range" has something to do with list generation? As we don't know the size of the list, it couldn't it be bcoz of that? – Syed Mauze Rehan Jul 15 '14 at 06:29
Maybe... but the serial output part is what is actually constant. The timing for "s" is specific to just that portion. The list is variable. Could be 12 could be 2000. In this particular instance, it was 576. – Adam Haile Jul 15 '14 at 11:24
Now I think his answer makes more sense to me. – Syed Mauze Rehan Jul 15 '14 at 14:09
To be honest, I ignored the statement that it doesn't include the serial IO and just assumed that `self._led.setBuffer(buf)` was the actual culprit. A rather static loop calculating a matrix shouldn't vary in duration so much, but in order to make such a statement, it would help to have the average, standard deviation and min/max values. – Ulrich Eckhardt Jul 15 '14 at 17:12

Inconsistent Python Performance on Windows

2 Answers2

Linked