Why does the efficiency of numpy not scale

Question

I have been comparing the relative efficiency of numpy versus Python list comprehensions in multiplying together arrays of random numbers. (Python 3.4/Spyder, Windows and Ubuntu).

As one would expect, for all but the smallest arrays, numpy rapidly outperforms an list comprehension, and for increasing array length you get the expected sigmoid curve for performance. But the sigmoid is far from smooth, which I am puzzling to understand.

Obviously there is a certain amount of quantization noise for shorter array lengths, but I am getting unexpectedly noisy results, particularly under Windows. The figures are the mean of 100 runs of the various array lengths, so should have any transient effects smoothed out (so I would have thought).

Numpy and Python list performance comparison

The figures below show the ratio of multiplying arrays of differing lengths using numpy against list comprehension.

Array Length    Windows     Ubuntu
           1        0.2        0.4
           2        2.0        0.6
           5        1.0        0.5
          10        3.0        1.0
          20        0.3        0.8
          50        3.5        1.9
         100        3.5        1.9
         200       10.0        3.0
         500        4.6        6.0
       1,000       13.6        6.9
       2,000        9.2        8.2
       5,000       14.6       10.4
      10,000       12.1       11.1
      20,000       12.9       11.6
      50,000       13.4       11.4
     100,000       13.4       12.0
     200,000       12.8       12.4
     500,000       13.0       12.3
   1,000,000       13.3       12.4
   2,000,000       13.6       12.0
   5,000,000       13.6       11.9

So I guess my question is can anyone explain why the results, particularly under Windows are so noisy. I have run the tests multiple times but the results always seem to be exactly the same.

UPDATE. At Reblochon Masque's suggestion I have disabled grabage collection. Which smooths the Windows performance out somewhat, but the curves are still lumpy.

Numpy and Python list performance comparison
(Updated to remove garbage collection)

Array Length    Windows     Ubuntu
           1        0.1        0.3
           2        0.6        0.4
           5        0.3        0.4
          10        0.5        0.5
          20        0.6        0.5
          50        0.8        0.7
         100        1.6        1.1
         200        1.3        1.7
         500        3.7        3.2
       1,000        3.9        4.8
       2,000        6.5        6.6
       5,000       11.5        9.2
      10,000       10.8       10.7
      20,000       12.1       11.4
      50,000       13.3       12.4
     100,000       13.5       12.6
     200,000       12.8       12.6
     500,000       12.9       12.3
   1,000,000       13.3       12.3
   2,000,000       13.6       12.0
   5,000,000       13.6       11.8

UPDATE

At @Sid's suggestion, I've restricted it to running on a single core on each machine. The curves are slightly smoother (particularly the Linux one), but still with the inflexions and some noise, particularly under Windows.

(It was actually the inflexions that I was originally going to post about, as they appear consistently in the same places.)

Numpy and Python list performance comparison
(Garbage collection disabled and running on 1 CPU)

Array Length    Windows     Ubuntu
           1        0.3        0.3
           2        0.0        0.4
           5        0.5        0.4
          10        0.6        0.5
          20        0.3        0.5
          50        0.9        0.7
         100        1.0        1.1
         200        2.8        1.7
         500        3.7        3.3
       1,000        3.3        4.7
       2,000        6.5        6.7
       5,000       11.0        9.6
      10,000       11.0       11.1
      20,000       12.7       11.8
      50,000       12.9       12.8
     100,000       14.3       13.0
     200,000       12.6       13.1
     500,000       12.6       12.6
   1,000,000       13.0       12.6
   2,000,000       13.4       12.4
   5,000,000       13.6       12.2

Garbage collection? Maybe you could disable it to run your tests? — Reblochon Masque, Apr 05 '16 at 13:42
here: http://stackoverflow.com/questions/20495946/why-disable-the-garbage-collector — Reblochon Masque, Apr 05 '16 at 13:49
@ReblochonMasque Thanks for that. It has improved the smoothness of the curve considerably although there is still a certain amount of noise. I'm also still getting inflections in the curve even for high values of n, which I presume are just noise... — TimGJ, Apr 05 '16 at 14:15
Good - it is always tricky to run benchmarks - other processes running on your system may affect performance; maybe you could re-try with a fresh boot, and as little other processes as possible. — Reblochon Masque, Apr 05 '16 at 14:20

score 3 · Accepted Answer · edited Apr 13 '17 at 12:22

3

The garbage collector explains the bulk of it. The rest could be fluctuation based on other programs running on your machine. How about turning most things off and running the bare minimum and testing it. Since you are using datetime (which is the actual time passed) it must be taking into account any processor context switches as well.

You could also try running this while having it affixed to a processor using a unix call, that might help further smoothen it out. On Ubuntu it can be done hence: https://askubuntu.com/a/483827

For windows processor affinity can be set thus: http://www.addictivetips.com/windows-tips/how-to-set-processor-affinity-to-an-application-in-windows/

edited Apr 13 '17 at 12:22

Community

1
1

answered Apr 05 '16 at 14:16

Sid

7,511
2
28
41

Most things are turned off - I just set the test running and leave both systems to their own devices. There is nothing much running on either whcih would account for such significant (and consistent) discrepancies AFAIK, but obvisouly _something_ is causing it. – TimGJ Apr 05 '16 at 14:23
Can you try pinning the program to a single CPU/Core and see how that fares? This should be easy to do on Ubuntu:http://askubuntu.com/a/483827 – Sid Apr 05 '16 at 14:25
Many thanks for the suggestion. The Linux implementation of it is actually scaling relatively smoothly, albeit with a couple of unexpected inflections. I will try the single core thing, but I don't know if that's even possible under Windows which is where the performance is significantly lumpier. – TimGJ Apr 05 '16 at 14:30
Here's how you can do it for windows http://www.addictivetips.com/windows-tips/how-to-set-processor-affinity-to-an-application-in-windows/ – Sid Apr 05 '16 at 14:31
Thanks for that. Have tried it, which seems to improve things somewhat (see updated post), but still getting noise on the Windows curve plus inflexions, whcih I just can't understand. – TimGJ Apr 05 '16 at 14:52
The fact that they are at the same places might be insightful. Maybe there is a memory allocation taking place at those points (that is a numpy array resize or something). – Sid Apr 05 '16 at 14:56
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/108298/discussion-between-timgj-and-sid). – TimGJ Apr 05 '16 at 15:40

score 2 · Answer 2 · edited May 23 '17 at 12:25

From my comments:

Usually Garbage collection explains noise in bnchmark performance tests; it is possible to disable it to run the tests, and under some conditions, will smoothen the results.

Here is a link to how and why disable the GC: Why disable the garbage collector?

Outside of the GC, it is always tricky to run benchmarks as other processes running on your system may affect performance (network connections, system backups, etc... that may be automated and silently running in the background); maybe you could re-try with a fresh system boot, and as little other processes as possible to see how that goes?

Why does the efficiency of numpy not scale

2 Answers2