cython ctypedef large double arrays lead to segfault on Ubuntu 18.04

Question

ctypedef struct ReturnRows:
    double[50000] v1
    double[50000] v2
    double[50000] v3
    double[50000] v4

works, but

ctypedef struct ReturnRows:
    double[100000] v1
    double[100000] v2
    double[100000] v3
    double[100000] v4

fails with Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

It does not make sense to me, because the upper limit should be close to the available limit of the system dedicated to that processing task. Is there an upper limit set in some way?

Here is my builder:

from distutils.core import setup
import numpy as np
from distutils.core import setup, Extension
from Cython.Build import cythonize

file_names = ['example_cy.pyx', 'enricher_cy.pyx']

for fn in file_names:
    print("cythonize %s" % fn)
    setup(ext_modules = cythonize(fn),
          author='CGi',
          author_email='hi@so.com',
          description='Utils for faster data enrichment.',
          packages=['distutils', 'distutils.command'],
          include_dirs=[np.get_include()])

From Question: How i use the struct? I iterate over it, coming from a pandas dataframe:

cpdef ReturnRows cython_atrs(list v1, list v2, list v3, list v4):

    cdef ReturnRows s_ReturnRows # Allocate memory for the struct


    s_ReturnRows.v1 = [0] * 50000
    s_ReturnRows.v2 = [0] * 50000
    s_ReturnRows.v3 = [0] * 50000
    s_ReturnRows.v4 = [0] * 50000

    # tmp counters to keep track of the latest data averages and so on.
    cdef int lines_over_v1 = 0
    cdef double all_ranges = 0
    cdef int some_index = 0


    for i in range(len(v3)-1):

        # trs
        s_ReturnRows.v1[i] = (round(v2[i] - v3[i],2))
        # A lot more calculations, why I really need this loop.

Does this answer your question? [Segmentation fault on large array sizes](https://stackoverflow.com/questions/1847789/segmentation-fault-on-large-array-sizes) — ead, Dec 27 '19 at 22:30
@eod Thanks, but sadly not realy. The only real solution proposed is to disable memory limits from the OS using `ulimit -s unlimited`, which is basically not a good idea. I am searching for a way to actually store these large amounts, compliant with the OS memory management. — gies0r, Dec 27 '19 at 22:39
This isn't a very useful question without knowing how you're allocating these structs (which you don't show). The more useful solution that the linked question gives is to allocate on the heap rather than the stack (`new` in C++, but probably `malloc` in Cython) — DavidW, Dec 27 '19 at 22:47
@DavidW Added some information around the processing.. Maybe the main question could lead to: Is it possible to pre-allocate and use larger areas like this or is this maybe better suited in other data structures (maybe such as 2darrays?). I would be interested in storring more than 200 million rows into one rotating data structure in the end and were thinking that native c-types are best-suitable. — gies0r, Dec 27 '19 at 23:27

DavidW · Answer 1 · 2019-12-28T00:04:03.233

As the linked question @ead suggested the solution is to allocate the variables on the heap rather than on the stack (as a function-local variable). The reason is that space on the stack is pretty limited (~8MB on linux), while the heap is (usually) whatever is available on your PC.

The linked question mainly refers to new/delete as the C++ way of doing so; while Cython code can use C++, C is more commonly used and to do that you use malloc/free. The Cython documentation is pretty good on this but to demonstrate it with the code in your question:

from libc.stdlib cimport malloc, free    

# note some discussion about the return-value at the end of the question...
def cython_atrs(list v1, list v2, list v3, list v4):

    cdef ReturnRows *s_ReturnRows # pointer, not value
    s_ReturnRows = <ReturnRows*>malloc(sizeof(ReturnRows))
    try:
        # all your code goes here and remains the same...
    finally:
        free(s_ReturnRows)

You could also use a module-level global variable, but your probably don't want to do that.

Another option would be to use a cdef class instead of a struct:

cdef class ReturnRows:
    double[50000] v1
    double[50000] v2
    double[50000] v3
    double[50000] v4

This is automatically allocated on the heap, and memory is tracked by Python.

You could also use 2D Numpy/other Python library arrays. These are also allocated on the heap (but it's hidden from you). The advantage is that Python keeps track of the memory so you can't forget to free it. The second advantage is you can easily vary the array size without recomplication. If you invent some particularly pointless microbenchmark to allocate lots of arrays and do nothing else you may find a performance difference but for most normal code you won't. Access through a typed memoryview should be as quick as a C pointer. You can find lots of questions comparing speed/other features of this but really you should just pick the one you find easiest to write (with structs that may be C, maybe).

The return of ReturnRows in your function adds a complication (and makes your existing code dubious too even when it doesn't crash). You should either write a cdef function and return a ReturnRows*, and move the deallocation to the calling function, or you you should write a def function and return a valid Python object. That might push you towards Numpy arrays as a better solution, or maybe a cdef class.

What your current function will do is a huge and in-efficicent conversion of ReturnRows to a Python dictionary (containing Python lists for the arrays) whenever it's called from Python. This probably isn't what you want.

cython ctypedef large double arrays lead to segfault on Ubuntu 18.04

1 Answers1