Halide with C layout numpy arrays

Question

I am starting to use Halide and use it from a Python environment. Within that Python environment data is passed around as Numpy arrays which actually are an alias to a C++ array defined elsewhere.

However, when I use call the Halide function I get the error:

Constraint violated: img.stride.0 (520) == 1 (1)
Aborted (core dumped)

which can be "solved" by copying the numpy arrays to Fortran layout arrays:

img=np.copy(img,order="F")
res=np.copy(res,order="F")

with img and res my input and output images. Note however that this involves extra copy operations which is really bad for the overall global memory accesses.

How can I circumvent this problem? A way I have been thinking about is to actually tell Python that my arrays have Fortran layout and have the indices properly switched.... However, I currently use PyArray_SimpleNewFromData to get the Python arrays (without actually copying the data) and that results in C style arrays.

If you consider one extra copy operation 'really bad', you shouldn't be using python/numpy, since overhead of this kind pretty much comes with the territory. That said, have you benchmarked if this is your bottleneck? If so, you must be using halide for some quite trivial operations. Which begs the question why you are so concerned with performance in the first place? — Eelco Hoogendoorn, Dec 23 '15 at 19:23
Eelco, with the solutions below (i.e. changing stride without data copies) overhead is down to 0.25 ms for a 512x512 float image where before it was about 7.5 ms. Halide is on improving data localisation and reducing bandwidth usage, so we should try to avoid any extra image copies. Note that I expect the np.copy to be really bad due to its transpose operation. Perhaps we should re-implement np.copy using Halide to improve its performance... — Klamer Schutte, Dec 24 '15 at 12:12

score 2 · Answer 1 · answered Dec 23 '15 at 19:08

Halide natively expects row-major storage, but indexes things like so: im(col, row)... and this looks an awful lot like column-major storage to someone used to treating images as matrices, or using 2D arrays in C.

So your choices are to change your indexing to match Halide's notion, or to tell Halide that your memory layout is the other way around (stride(0) is large).

There's a tutorial that covers a closely related topic here: http://halide-lang.org/tutorials/tutorial_lesson_16_rgb_generate.html

The short version for 2D inputs and Funcs is:

image_param.set_stride(0, Expr()).set_stride(1, 1);
output_func.output_buffer().set_stride(0, Expr()).set_stride(1, 1);

The first set_stride call unconstrains the stride in dimension 0, and the second tells Halide it can assume the stride in dimension 1 is 1. If you do this, you'll want to vectorize your Halide Funcs across the second dimension, because that's the one that's dense in memory:

f(i, j) = ...
f.vectorize(j, 4)

score 0 · Accepted Answer · answered Dec 24 '15 at 11:57

The problem is that PyArray_SimpleNewFromData made a C style ndarray from the data, where in the host C++ code the arrays are Fortran style. A solution is to convert the ndarrays just after they are created, which can be done by code like:

def swap(img):
    (sh1,sh2)=img.shape
    (st1,st2)=img.strides
    img.shape=(sh2,sh1)
    img.strides=(st2,st1)

After this within Halide we can normally vectorize in zero (x) dimension.

Halide with C layout numpy arrays

2 Answers2

Linked