2D MemoryView from dynamic arrays in Cython

Question

I am aware of this question, but I was looking for a simpler way to generate 2d memoryviews from C arrays. Since I am a C and Cython noobie, could someone please explain why something like

cdef int[:, :] get_zeros(int d):
    # get 2-row array of zeros with d as second dimension
    cdef int i
    cdef int *arr = <int *> malloc(sizeof(int) * d)
    for i in range(d):
        arr[i] = 0
    cdef int[:, :] arr_view
    arr_view[0, :] = <int[:d]>arr
    arr_view[1, :] = <int[:d]>arr
    return arr_view

won't work?

When compiling it I get Cannot assign type 'int[::1]' to 'int' as error. Does this mean, that the 2d memview is collapsed by the first assign statement to 1d or is it because memoryviews need contiguous blocks etc.?

score 2 · Accepted Answer · answered Oct 31 '19 at 11:01

It's obviously quite hard to "explain why something [...] won't work", because ultimately it's just a design decision that could have been taken differently. But:

Cython memoryviews are designed to be pretty dumb. All they do is provide some nice syntax to access the memory of something that implements the Python buffer protocol, and then have a tiny bit of additional syntax to let you do things like get a 1D memoryview of a pointer.

Additionally, the memoryview as a whole wraps something. When you create cdef int[:, :] arr_view it's invalid until you do arr_view = something. Attempts to assign to part of it are nonsense, since (a) it'd delegate the assignment to the thing it wraps using the buffer protocol and (b) exactly how the assignment would work would depend on what format of buffer protocol you were wrapping. What you've done might be valid if wrapping an "indirect" buffer protocol object but would make no sense if wrapping a contiguous array. Since arr_view could be wrapping either the Cython compiler has to treat it as an error.

The question you link to implements the buffer protocol and so is the correct way to implement this kind of array. What you're attempting to do is to take the extra syntax that gives a 1D memoryview from a pointer and force that into part of a 2D memoryview in the vague hope that this might work. This requires a lot of logic that goes well beyond the scope of what a Cython memoryview was designed to do.

There's probably a couple of additional points worth making:

Memoryviews of pointers don't handle freeing of pointers (since it'd be pretty much impossible for them to second-guess what you want). You have to handle this logic. Your current design would leak memory, if it worked. In the design you linked to the wrapping class could implement this in __dealloc__ (although it isn't shown in that answer) and thus much better.
My personal view is that "ragged arrays" (2D arrays of pointers to pointers) are awful. They require a lot of allocation and deallocation. There's lots of opportunity to half-initialize them. Access to them requires a couple of levels of indirection and so is slow. The only thing going for them is that they provide a arr[idx1][idx2] syntax in C. In general I much prefer Numpy's approach of allocating a 1D array and using shape/strides to work out where to index. (Obviously if you're wrapping an existing library then you may not be your choice...)

Thanks for the detailed answer! I already thought, that what I was trying to achieve there felt a like cheating. Just like in your suggestion I implemented a 1D array for my problem in the end. — zeawoas, Oct 31 '19 at 21:58

CodeSurgeon · Answer 2 · 2019-11-01T02:04:30.827

2

In addition to the wonderful answer @DavidW has provided, I would like to add some more info. In your included code, I see that you are malloc-ing an array of ints and then zeroing out the contents in a for-loop. A more convenient way of accomplishing this is to use C's calloc function instead, which guarantees a pointer to zeroed memory and would not require a for loop afterwards.

Additionally, you could create a single int * that points to an "array" of data that is calloced to a total size of 2 * d * sizeof(int). This would ensure that both of the "rows" of data are contiguous with each other instead of separate and ragged. This could then be cast directly to a 2d memoryview.

As promised in the comments, here is what that conversion code could look like (with calloc use included):

cdef int[:, :] get_zeros(int d):    
    cdef int *arr = <int *>calloc(2 * d, sizeof(int))
    cdef int[:, :] arr_view = <int[:2, :d]>arr
    return arr_view

There also appears to be a calloc equivalent in the python c-api per the docs if you want to try it out. However, it does not appear to be wrapped in cython's mem.pxd module, which is why you were likely not able to find it. You could declare a similar extern block in your code to wrap it like the other functions included in that link.

And here is a bonus link if you want to know more about writing an allocator to dole out memory from a large block if you go the pre-allocation route (i.e. what PyMem_* functions likely do behind the scenes, but more tunable and under your control for your specific use case).

edited Nov 01 '19 at 02:04

answered Oct 31 '19 at 21:21

CodeSurgeon

2,435
2
15
36

1

Thanks for the input! I already thought about `calloc`, however the [Cython documentation](https://cython.readthedocs.io/en/latest/src/tutorial/memory_allocation.html) recommends importing the C-API functions for managing memory from `cpython.mem` and I have not found an equivalent function of C's `calloc`. Also, I benchmarked using `PyMem_Malloc as malloc` and the for loop vs just `calloc` from `libc.stdlib` and there was almost no difference. Regarding your 2nd point I'm not sure how I would go about implementing this. Could you provide a small code snippet or example? – zeawoas Oct 31 '19 at 21:52
1

@zeawoas Interesting to know about the performance comparison. I should be able to write up a snippet for the second point once I get home in a couple of hours since I am away from a computer now. Basically though, you should be able to use a bracket cast like cdef int[:,:] arr_view = arr to do the conversion. – CodeSurgeon Oct 31 '19 at 22:03
2

Personally I'd be strongly inclined to use a Numpy array for your 2d array, rather than some variant of `malloc`. Unless you have a good reason not to... – DavidW Oct 31 '19 at 22:11
Yes there are definitely benefits with the numpy approach. For example, you would not need to worry about a paired "free" call for deallocating the array. – CodeSurgeon Oct 31 '19 at 22:15
1

@CodeSurgeon I just reran the benchmarks and `libc.stdlib.malloc` + for loop and `libc.stdlib.calloc` are pretty much the same speed while `cpython.mem.PyMem_Malloc` + for loop is faster for small arrays (100x100) and slower for larger arrays (10000x10000) which seems to be in line with the Cython docs referenced earlier. @DavidW I am calling a recursive function that has to generate a small array millions of times and switching over to malloc saved me about 50% of the runtime. Otherwise I'd prefer numpy as well. – zeawoas Oct 31 '19 at 22:23
1

That seems reasonable to me. It is likely the python does some batching to minimize the number of calls to malloc with its memory management functions for small allocations as the docs state. Each call to malloc requires processing in both "user" space and "kernel" space, which batching can avoid. You can take a look at [this video](https://youtu.be/c0g3S_2QxWM) which describes these costs from a gamdev perspective. – CodeSurgeon Oct 31 '19 at 22:31
If you are making millions of tiny allocations though as you hint at, it might be better to make one large preallocation at the start and then just keep an additional variable around that tracks the amount of memory from that chunk you have used manually (think like a linear/slab allocator). Of course, this is a lot more management/code burden that you would have to do behind the scenes and is well in the territory of micro-optimization. – CodeSurgeon Oct 31 '19 at 22:50
1

@CodeSurgeon thanks for all the suggestions, I am looking into them. However, allocating the arrays takes up way less than a percent of runtime at the current state of the application and thus I guess I leave it the way it is for now. – zeawoas Nov 02 '19 at 11:08
@CodeSurgeon when importing functions within an `cdef extern` block, what is the best way to rename the function? I would like to do something equivalent to `import PyMem_Calloc as calloc`. For now, I have just cdef-ed an inline function named `calloc` that calls `PyMem_Calloc` and returns the result. is there a smarter way? – zeawoas Nov 03 '19 at 13:43
I would take a look at [this part of the Cython docs](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html), particularly the "Resolving naming conflicts..." section. You can do something like the "import ... as ..." with "cimport" instead. If you do not want to do that every time you use your calloc function, you can do something similar to what they do for the "yield" function on that docs page to rename your function in the declaration itself. – CodeSurgeon Nov 03 '19 at 15:26
That being said, if you go the latter route, you should rename the function something like py_calloc so it is clear that the behavior is different. Unlike the raw or C versions, I am not sure that the PyMem versions are multithreading safe. – CodeSurgeon Nov 03 '19 at 17:16

2D MemoryView from dynamic arrays in Cython

2 Answers2