0

I'm trying to create a Python module that converts an array of C structs returned from a C shared library, to a Numpy array. I'm new to both Numpy and Cython (but have doing C a long time), so I've been kind of learning as I go. A few notes: 1) the C shared library, called HBtrial, is in a different directory 2) the C code calloc()'s the memory, fills in the structs and returns a pointer to the array of structs 3) I need the returned array to be a Numpy array (preferably a structured array that will probably then be converted to a Pandas Dataframe)

After trying a number of things, I got the farthest (including getting the .pyx file to compile) by doing the following.

trial.pyx

import cython
from cpython.ref cimport PyTypeObject
cimport numpy as np
import numpy as np
cimport HBtrial


cdef extern from "numpy/ndarrayobject.h":
    object PyArray_NewFromDescr(PyTypeObject *subtype,
        np.dtype newdtype,
        int nd,
        np.npy_intp* dims,
        np.npy_intp* strides,
        void* data,
        int flags,
        object parent)

np.import_array()

class MyStruct(object):
    dtype_mystruct = np.dtype ([('item', 'S16'),
                                ('date', 'S16'),
                                ('val1', 'u1'),
                                ('val2', 'u1'),
                                ('val3', 'i2')
                               ])

    def __init__(self):
        pass

    def return_dtype(self):
        return self.dtype_mystruct

    @cython.boundscheck(False)
    def return_values(self):
        cdef int rows
        cdef HBtrial.MYSTRUCT *arr = HBtrial.return_values(&rows)

        print arr[1]
        print "npy array"
        cdef np.npy_intp dims = rows

        nparr = np.PyArray_NewFromDescr(np.ndarray,
                                        self.dtype_mystruct,
                                        1,
                                        dims,
                                        <object>NULL,
                                        <object><void *>arr,
                                        0,
                                        <object>NULL)

        print nparr[1]
        return nparr

It compiles okay, but then I try to use it in the small Python script as follows:

try.py:

#!/usr/bin/env python

import sys
import os
import numpy as np

from trial import MyStruct

def main():
    mystruct = MyStruct()
    dt = mystruct.return_dtype()
    print dt
    arr = mystruct.return_values()
    print arr

if __name__ == "__main__":
    main()

When I run it, it prints out the "print dt" line fine, but I get the following error:

Traceback (most recent call last):
  File "./try.py", line 18, in <module>
    main()
  File "./try.py", line 14, in main
    arr = mystruct.return_values()
  File "trial.pyx", line 43, in trial.MyStruct.return_values (trial.c:1569)
    nparr = np.PyArray_NewFromDescr(np.ndarray,
AttributeError: 'module' object has no attribute 'PyArray_NewFromDescr'

How do I get past this error? I feel like I'm maybe missing something basic. Any ideas? If I'm totally off base in my approach, let me know that, too.

Here are the other files, if it helps:

trial.pxd:

from libc.stdint cimport int8_t, int16_t, uint8_t, uint16_t

cdef extern from "HBtrial.h" nogil:

    ctypedef packed struct MYSTRUCT:
        char item[16];
        char date[16];
        uint8_t val1;
        uint8_t val2;
        int16_t val3;

    cdef MYSTRUCT *return_values(int *rows)

HBtrial.h:

#ifndef HBTRIAL_H
#define HBTRIAL_H

typedef struct {
    char item[16];
    char date[16];
    uint8_t val1;
    uint8_t val2;
    int16_t val3;
} MYSTRUCT;

MYSTRUCT *return_values(int *rows);

#endif

HBtrial.c:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include "HBtrial.h"

MYSTRUCT *return_values(int *rows)
{
    int i;
    MYSTRUCT *arr;
    int numrows = 5;

    arr = calloc(numrows, sizeof(MYSTRUCT));

    for (i=0; i < numrows; i++) {
        sprintf(arr[i].item, "row%d", i);
        sprintf(arr[i].date, "201908100%d", i+1);
        arr[i].val1 = i+2;
        arr[i].val2 = i+i;
        arr[i].val3 = i*i;
    }
    *rows = numrows;
    return(arr);
}

HBtrial.c and HBtrial.h are in /home/xxxx/lib/try3 and get compiled into a shared library, "libHBtrial.so".

setup.py:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy as np

trial = Extension(
    name="trial",
    sources=["trial.pyx"],
    extra_compile_args=["-std=c99"],
    libraries=["HBtrial"],
    library_dirs=["/home/xxxx/lib/try3"],
    include_dirs=[np.get_include(), "/home/xxxx/lib/try3"]
)

setup(
    name="trial",
    ext_modules=cythonize([trial])
)

If there's a better way, I'd be interested in that, too. For instance, I tried other things like converting the returned array to a Cython typed memoryview, or using np.frombuffer(), but always got an error that it "Cannot convert MYSTRUCT * to" a memoryview or python object or whatever.

bkb105
  • 1
  • 1
  • it is `PyArray_NewFromDescr` not `np.PyArray_NewFromDescr`. – ead Sep 09 '19 at 18:48
  • Then you need to drop all in arguments of `PyArray_NewFromDescr` as well. – ead Sep 09 '19 at 18:55
  • @ead I tried that. It fails to compile with an error of `undeclared name not builtin: PyArray_NewFromDescr`. Am I missing an import or something? It doesn't seem to know about things I would think it ought to. – bkb105 Sep 09 '19 at 19:08
  • If I were doing it I'd probably try to [implement the Python buffer protocol](https://cython.readthedocs.io/en/latest/src/userguide/buffer.html): create a cdef class that manages the memory from `return_values` and implements `__getbuffer__`. You should be able to pass that straight to Numpy (Python API, rather than C API). – DavidW Sep 09 '19 at 19:09
  • You have also another problem: a possible memory leak, because who owns `arr`? See a somewhere related answer: https://stackoverflow.com/a/55959886/5769463 – ead Sep 09 '19 at 19:11
  • @DavidW I doubt implementing buffer protocol is easier than figuring out how to correctly call `PyArray_NewFromDescr` – ead Sep 09 '19 at 19:19
  • @bkb105 if you see this error, then your code isn't the code presented in the question (see `cdef extern from ...` block). However, you have another problems in how you call `PyArray_NewFromDescr`, e.g. `np.ndarray` isn't of type `PyTypeObject *`. – ead Sep 09 '19 at 19:23
  • @ead - no, I don't think it's easier. It does solve the memory management problem (and the associated problem if matching malloc with Numpy's free) and that's why I like it slightly more. – DavidW Sep 09 '19 at 21:08

1 Answers1

0

Okay, I finally got it working. Thanks to comments by @ead and something I already had been thinking about, I needed to cdef the call to PyArray_FromNewDescr and change a couple arguments.
As part of the "cdef extern..." block I added:

PyTypeObject PyArray_Type

Then the call to the routine becomes:

cdef np.ndarray nparr = PyArray_NewFromDescr(&PyArray_Type,
                                            self.dtype_mystruct,
                                            1,
                                            &dims,
                                            NULL,
                                            <void *>arr,
                                            0,
                                            <object>NULL)

Now I can print the array with the proper values on return from routine.
As for the memory leak, based on other posts I found, I should be able to take care of that by setting the OWNDATA flag on the array before I return it, right?

bkb105
  • 1
  • 1