6

I'm trying to program a function in cython for a monte-carlo simulation. The function involves multiple small linear algebra operations, like dot products and matrix inversions. As the function is being called hundred of thousands of times the numpy overhead is getting a large share of the cost. Three years ago some one asked this question: calling dot products and linear algebra operations in Cython? I have tried to use the recommendations from both answers, but the first scipy.linalg.blas still goes through a python wrapper and I'm not really getting any improvements. The second, using the gsl wrapper is also fairly slow and tends to freeze my system when the dimensions of the vectors are very large. I also found the Ceygen package, that looked very promising but it seems that the installation file broke in the last Cython update. On the other hand I saw that scipy is working on a cython wrapper for lapack, but it looks as its still unavailable (scipy-cython-lapack) In the end I can also code my own C routines for those operations but it seems kind of re-inventing the wheel.

So as to summarize: Is there a new way to this kind of operations in Cython? (Hence I don't think this is a duplicate) Or have you found a better way to deal with this sort of problem that I haven't seen yet?

Obligatory code sample: (This is just an example, of course it can still be improved, but just to give the idea)

 cimport numpy as np
 import numpy as np

 cpdef double risk(np.ndarray[double, ndim=2, mode='c'] X,
     np.ndarray[double, ndim=1, mode='c'] v1, 
     np.ndarray[double, ndim=1, mode='c'] v2):

     cdef np.ndarray[double, ndim=2, mode='c'] tmp, sumX
     cdef double ret

     tmp = np.exp(X)
     sumX = np.tile(np.sum(tmp, 1).reshape(-1, 1), (1, tmp.shape[0]))
     tmp = tmp / sumX
     ret = np.inner(v1, np.dot(X, v2))
     return ret

Thanks!!

tl;dr: how-to linear algebra in cython?

Community
  • 1
  • 1
BVJ
  • 568
  • 4
  • 18
  • 1
    On the BLAS side of things, have you looked at [`tokyo`](https://github.com/tokyo/tokyo)? – ali_m Jul 18 '15 at 00:14
  • Yes but it's quite limited and it seems as its not being maintained (last commit was two years ago I think) – BVJ Jul 18 '15 at 00:36
  • 1
    Although tokyo is incomplete, I think that general approach is still quite instructive. You could either use the current dev version of scipy, which seems to have Cython wrappers for BLAS as well as LAPACK, or in the mean time you could write an ad hoc `.pxd` header to expose the specific routines you need. – ali_m Jul 18 '15 at 01:24
  • what's the dimensions of these arrays. If small enough it may be fastest to write out the algegraic expression in full `c` detail. – hpaulj Jul 18 '15 at 01:24
  • f.y.i. The Cython interfaces in Scipy were recently presented at the 2015 Scipy conf: https://www.youtube.com/watch?v=R4yB-8tB0J0 – Caleb Hattingh Jul 18 '15 at 04:03
  • Thank you all for the comments. Ali_m: thats not a bad idea. Ill keep that option in mind if nothing prebuilt shows up. Hpaulj: these array vary in dimension. The problem is there are many outer products involved so there is a lot of indexing to be done. Cjrh: thanks for the link. I cant seem to find a way to install the dev version of scipy, but that would be a great solution. – BVJ Jul 18 '15 at 04:25
  • @BVJ Heres the [link](https://github.com/scipy/scipy/releases/tag/v0.16.0rc1). To install just run `python setup.py install`. Its a release candidate not a beta so it is less likely to have issues. – rohanp Jul 18 '15 at 15:39

2 Answers2

1

The answer you link to is still a good way to call BLAS function from Cython. It is not really a python wrapper, Python is merely used so get the C pointer to the function and this can be done at initialization time. So you should get essentially C-like speed. I could be wrong, but I think that the upcoming Scipy 0.16 release will provide a convenient BLAS Cython API, based on this approach, it will not change things performance wise.

If you didn't experience any speed up after porting to Cython of repeatedly called BLAS functions, either the python overhead for doing this in numpy doesn't matter (e.g. if the computation itself is the most expensive part) or you are doing something wrong (unnecessary memory copies etc.)

I would say that this approach should be faster and easier to maintain than with GSL, provided of course that you compiled scipy with an optimized BLAS (OpenBLAS, ATLAS, MKL, etc).

rth
  • 10,680
  • 7
  • 53
  • 77
  • 2
    Hi thanks. It seems the last part was the problem. I installed ATLAS and I'm getting much better results now. It does seems quite a lot of work just to make a matrix product, so the scipy BLAS Cython API could save a lot of work. btw, I get very similar results by using Tokyo's no-verification functions, so for those who need only the simple operations, it is definitely easier to use. – BVJ Jul 18 '15 at 16:55
0

As part of an einsum patch, I wrote a Python simulation of the function's c code. While I parsed the string in Python, I used cython to write the sum-of-products calculation.

That pyx is at https://github.com/hpaulj/numpy-einsum/blob/master/sop.pyx

You might even be able to call the compile einsum module with going through a Python call. There have a number of SO questions comparing np.dot and np.einsum, which is better for different calculations.

http://docs.cython.org/src/userguide/memoryviews.html

Look also at cython memoryviews - you might be able to use those, or cython arrays, much like you would numpy arrays. This doc mentions for example that you can add a dimension with [None,:]. Does that mean you can do the common numpy outer product via broadcasting?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks. As I wrote above, I could rewrite in C or Cython the operations again as you did, but I really believe in the saying "Do not reinvent the wheel, unless you really want to learn about wheels". Also your point about memoryviews, as far as I'm concerned you cant directly multiply memoryviews so you will end up looping again. I think I might be on to something by mixing cpython arrays and the tokyo implementation...I'll post if I get anything interesting. – BVJ Jul 18 '15 at 14:21