2

I am considering switching from MATLAB to Python. The core of my MATLAB code repeatedly calls erf on an array of a few thousand numbers, like:

r=rand(1,1e5)

erf(r)

This is my implementation in Python:

import numpy as np
import scipy.special as sps

r=np.random.rand(1e5)

sps.erf(r)

The Python version takes about three times as long. If I use Cython to compile just this core of the program, will I see a major speedup? I have very little Python experience, and no C experience, so I thought I'd check here before trying to figure out Cython.

Mr. W.
  • 359
  • 3
  • 12
  • 1
    How did you measure the python version? Calculating erf on a few thousand numbers doesn't seem to be too demanding. Also the context may be important, because "python is slow" the bigger bottleneck will be how often your code does call back to python instead of doing lifting inside of numpy. Doing "more complete" (in the sense of more of the algorithm) testing might yield more detailed information. – syntonym Mar 20 '15 at 16:02
  • By the way, calling `sps.erf(np.random.rand(1e5))` takes about 4ms on my computer. If this really is your use case (and not _calling that function 1000 times inside a loop_) then you've spent a lot more time thinking about this function than you'll ever save. – Carsten Mar 20 '15 at 16:23
  • Is the performance difference mainly due to `erf` or `rand`? I'd guess the latter. If you really want to speed up your code, implement a faster random number generator yourself in either Matlab mex or whatever the Cython equivalent to that is. For example: [Double precision SIMD-oriented Fast Mersenne Twister](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html#dSFMT) or the super-simple and even faster [xorshift+ and xorshift* generators](http://xorshift.di.unimi.it) if you don't need quite as long a period. – horchler Mar 20 '15 at 16:45
  • 1
    @Carsten: I'm doing MCMC sampling, so I'm calling `erf` millions or billions of times. – Mr. W. Mar 20 '15 at 18:00
  • @horchler: I tried taking `rand` out of the equation (I only used it here as a placeholder for real data), and it still holds that MATLAB's `erf` is faster. – Mr. W. Mar 20 '15 at 18:03
  • If you get rid of the random numbers, it turns out that MATLAB's `erf` is actually 3x faster. – Mr. W. Mar 20 '15 at 18:20
  • Matlab's JIT compilation is probably responsible for part of that. Also, have you compiled Python using Intel's MKL and other optimized libraries? See [this question](http://stackoverflow.com/q/2133031/2278029). – horchler Mar 20 '15 at 21:16
  • @horchler: I just got [Anaconda Accelerate](https://store.continuum.io/cshop/accelerate/), which apparently has MKL-optimized NumPy and SciPy, and saw no improvement. – Mr. W. Mar 21 '15 at 00:35

1 Answers1

6

I would not expect Cython to be able to speed this up. All of the time-consuming work is being done inside sps.erf and np.random.rand; the Python interpreter is just passing arrays around.

In MATLAB terms, Cython is most likely to help when you have not been able to fully vectorize your calculations.

You might want to consider reporting this performance differential to the SciPy developers. Perhaps there is a bug that is simple to fix.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 1
    The SciPy erf uses an implementation more like http://www.mathworks.com/matlabcentral/fileexchange/38787-faddeeva-package--complex-error-functions than like the standard matlab erf. Some of the standard matlab erf vs. scipy erf differences may be due to tradeoffs for accuracy. – identity-m Mar 20 '15 at 20:03