Why is Pypy slower for adding numpy arrays?

Question

To test the claims that Pypy JIT is significantly faster I wrote a simple code that repeatedly adds two arrays of size 1200x1200. My code is as follows

import numpy as np
import random

a=np.zeros((1200, 1200), dtype=np.float32)
b=np.zeros((1200, 1200), dtype=np.float32)
import timeit
#Start timer
start=timeit.default_timer()
#Initialize the arrays
for j in range(1200):
    for k in range(1200):
        a[j][k]=random.random()
        b[j][k]=random.random()
#Repeatedly add the arrays    
for j in range(10):
    a=np.add(a,b)
#Stop timer and display the results
stop=timeit.default_timer()
print stop-start

With normal python the time taken for execution is about 1.2 - 1.5sec. However with Pypy it it more than 15sec? Also in the above case I have added the arrays only 10 times. If I increase this value to 1000, my computer stops responding. I found that this was because almost the entire RAM was consumed while using pypy. Am I doing something wrong? Or is the issue something else?

Most of the time measured comes from constructing the arrays and not from the addition. What happens if you replace `random.random()` with a constant value `1` or something? Lacking any experience with Pypy I'm just guessing, but maybe this function does not like to be JITed. — MB-F, Mar 01 '17 at 16:59
Note that a more efficient method to constuct the arrays would be `np.random.rand(1200, 1200).astype(np.float32)`. — MB-F, Mar 01 '17 at 17:04
.astype is very slow for numpy. Quite a few functions in pypy+numpy are considerably slower than their CPython counterparts. For now. — J.J, Mar 02 '17 at 03:33
I should say though, if you want to build structured tables and you don't really need to do anything other than insert and pull out values, pypy's CFFI is about 1000x faster than numpy. It won't do checks to make sure you don't add overly large/small numbers into the wrong place, but it's as fast as C, except in python. — J.J, Mar 02 '17 at 03:38

score 5 · Answer 1 · answered Apr 07 '17 at 05:21

The JIT cannot help in this case since very little time is spent in python code. NumPy is written in C, so the JIT cannot look into that code and make it faster. In fact, PyPy suffers in this case since the impedance match between PyPy, written in RPython, and NumPy, written in C, means that each time a NumPy function is called from PyPy additional conversion code must run to prepare and call the C function.

CFFI was specifically written for this use case, the conversions necessary to call into C are already taken care of at object creation time, so the program can more seamlessly run both.

The memory problems are a separate issue and should be fixed, see the answer to the garbage collection question above.

score 3 · Accepted Answer · edited May 23 '17 at 12:32

pypy doesn't do garbage collection in numpy arrays in all circumstances, and that's likely the reason you're running out of memory, spilling to disk, and then locking up.

numpy.ndarray objects not garbage collected

Reducing numpy memory footprint in long-running application

Memory profiler for numpy

There are two solutions. The easiest is to simply tell pypy to delete the array by doing:

import gc
del my_array
gc.collect()

This will force pypy to do a garbage collection. Note that gc.collect() shouldn't be put into tight loops unless it's really really necessary.

The second, more manual solution is to make arrays yourself with CFFI, and tell numpy about them with the arrays interface: https://docs.scipy.org/doc/numpy/reference/arrays.interface.html

This way you can still manipulate the struct from numpy, but you have the control to delete/resize the array manually.

Why is Pypy slower for adding numpy arrays?

2 Answers2