0

I'm having some trouble deciphering this cython error:

---> 83     return RVKDE_loop(S, X, sigmas, n, m)

_cython_magic_e23143d4f8f8ff2526f1f1d1d47eacfb.pyx in _cython_magic_e23143d4f8f8ff2526f1f1d1d47eacfb.RVKDE_loop()

TypeError: unsupported operand type(s) for -: 'float' and '_cython_magic_e23143d4f8f8ff2526f1f1d1d47eacfb._memoryviewslice'

Here's the cython spaghetti code that produced the error:

%%cython
cimport cython
import numpy as np
from libc.stdio cimport printf

ctypedef fused dfloat:
    cython.float
    cython.double
    int


def RVKDE_loop(dfloat[:] S, dfloat[:] X, dfloat[:] sigmas, int n, int m):
    total = []
    for i in range(n):
        norms = -(abs(X[i]-S)**2)
        exp_denom = (2*(sigmas[i]**2))
        a_denom = ((sigmas[i]*np.sqrt(2*np.pi))**m)
        a = (1/a_denom)
        value = a*np.exp(norms/exp_denom)
        total += [value]
    return (1/n)*np.sum(total,axis=1)

Here are some sample inputs

S: np.array([-1.3015387, -0.07296862,  0.2387931,   0.38824359,  0.47182825,  0.75062962,
  1.3190391,   1.86540763,  2.62434536,  2.74481176])

X: np.array([-5.,         -3.77777778, -2.55555556, -1.33333333, -0.11111111,  1.11111111,
  2.33333333,  3.55555556,  4.77777778,  6.        ])

sigmas: np.array([2.43158064, 0.59796393, 0.27334147, 0.14160981, 0.14160981, 0.53204325,
 1.06717756, 1.06717756, 0.21537329, 0.21537329])

n: 10
m: 1

Any tips on how I might go about fixing this would be great. Note that the cython function does what I want it to do when it's regular python code; it's just a bit too slow for my liking in regular python.

bug_spray
  • 1,445
  • 1
  • 9
  • 23
  • 1
    cython's memory view aren't numpy's arrays, so you should not expect them to support broadcasting (that is basically what the error tells you). Btw. just cythonizing some numpy-code will not make it any faster - there is an example how cython and other frameworks could beat numpy's code: https://stackoverflow.com/a/54312134/5769463 – ead Mar 30 '20 at 19:16
  • I've gotten numpy broadcasting to work in very similar code with cython before though... the difference was I didn't use `X[i]` but just had a loop that looked like `for x in X`, and doing so provided quite a speedup. – bug_spray Mar 30 '20 at 19:19
  • It's not the numpy broadcasting I want to speedup, it's the for-loop (because n can get quite large) – bug_spray Mar 30 '20 at 19:20
  • 1
    The repeated index lookups are where I might start. `for x, sigma in zip(X, sigmas)` may help here. Also, `list += list` might be slower than `list.append(value)`, since the `+` operator creates a new list – C.Nivs Mar 30 '20 at 19:22
  • I assume profiling have shown you where the hot spots are - it always surprises me where the bottleneck is! However one last advice: cythonize with option `-a` to see where Python-interaction and a lot of overhead happens. – ead Mar 30 '20 at 19:27
  • @ead do you know the jupyter notebook equivalent of that? – bug_spray Mar 30 '20 at 19:27
  • Thanks @C.Nivs, I didn't know that about list += list – bug_spray Mar 30 '20 at 19:28
  • @C.Nivs regarding the zip() thing... do you know of a way to do that with 3 items (X, S, sigmas)? – bug_spray Mar 30 '20 at 19:44
  • 2
    @C.Nivs Repeated index lookups are really fast in Cython. Probably noticeably better than using `zip`. (Ignore about `append` if you saw that... I was wrong about that) – DavidW Mar 30 '20 at 19:55
  • @bug_spray if you do want to use `zip` it works fine with however many items you want. But if you're using Cython it's probably worse. – DavidW Mar 30 '20 at 19:55
  • @DavidW interesting, I'll have to do some digging into the Cython implementation of `zip`, didn't realize it was a bottleneck – C.Nivs Mar 30 '20 at 20:07
  • 2
    @DavidW reading a bit further, it appears `cdef int i` with the loop is indeed faster. I guess `zip` just doesn't compile down in C as nicely since now there is an additional PyObject (the iterator) that needs to be manipulated, slowing it down. TIL – C.Nivs Mar 30 '20 at 20:17
  • 1
    @C.Nivs - `list += list` is an in-place add which just extends the current list, it doesn't create a new one. But `append` is still a bit faster here because you don't need to create the throwaway `[value]` list. – tdelaney Mar 31 '20 at 05:50
  • @tdelaney ah right, because that's the `__iadd__` method, not `__add__`, I guess now I'm just getting sloppy :) – C.Nivs Mar 31 '20 at 13:33

0 Answers0