2

I am new to Cython and have very little experience with C so bear with me.

I want to store a fixed-size sequence of immutable byte objects. The object would look like:

obj = (b'abc', b'1234', b'^&$#%')

The elements in the tuple are immutable, but their length is arbitrary.

What I tried was something along the lines of:

cdef char[3] *obj
cdef char* a, b, c
a = b'abc'
b = b'1234'
c = b'^&$#%'
obj = (a, b, c)

But I get:

Storing unsafe C derivative of temporary Python reference

Can someone please point me in the right direction?

Bonus question: how do I type an arbitrarily long sequence of those 3-tuples?

Thanks!

user3758232
  • 758
  • 5
  • 19
  • 1
    Could you clarify your "bonus" question? Do you mean creating a resizable array of `char* obj[3]` objects? Is this supposed to be accessible from python, or just as a C-level data structure within cython? – CodeSurgeon May 17 '18 at 19:52
  • 1
    Yes, it would be like a set or a list, e.g. [(b'a1', b'a2', b'a3'), (b'b1', b'b2', b'b3'), ...] Only accessible in Cython. – user3758232 May 18 '18 at 01:38
  • 1
    Okay, that helps clear it up a bit! The next question then is whether you want to stick with pure C-style cython code and would prefer to write your own tailored data structures, or if you would rather use C++ features such as the STL, which can come with some useful data structures. – CodeSurgeon May 18 '18 at 02:09
  • I have close to no experience with C, so I am not sure what would be best for me, given that performance is what I am after here. I looked into numpy arrays, but that might be overkill... – user3758232 May 18 '18 at 02:12
  • 1
    In that case, I will try to address both ways. The reason I ask is that you had used `char *`. As soon as I bring in cython's `libcpp` module, I can use c++'s `string` class instead, as well as data structures like `vector` and `unordered_map`, which are like python's list and dict. I would suggest you take a look at this stuff in the cython documentation [here](http://cython.readthedocs.io/en/latest/src/userguide/wrapping_CPlusPlus.html) under the "Standard Library" section. – CodeSurgeon May 18 '18 at 02:33
  • Okay, wrote something simple up to help you get started. Ultimately though, you probably will need to either look up how to use the C++ stdlib on your own, or how to implement data structures like vectors, hashmaps, etc, by hand in C-esque cython. Good luck! – CodeSurgeon May 18 '18 at 03:15
  • 1
    Also, you might be surprised to find that the performance difference between C++ data structures and python's containers might be smaller than you expect, especially with python's `dict`! I suggest you run some benchmarks for yourself with `time` or `timeit` and see the results for yourself. – CodeSurgeon May 18 '18 at 03:19
  • Finally, you also might find some of the points I wrote as an answer to [this question](https://stackoverflow.com/q/49883145) to be useful to help solidify some C concepts with regards to pointers. – CodeSurgeon May 18 '18 at 03:21
  • Amazing! Thanks a lot. – user3758232 May 18 '18 at 03:55

1 Answers1

1

You are definitely close! There appears to be two issues.

First, we need to change the declaration of obj so that it reads that we are trying to create an array of char* objects, fixed to a size of 3. To do this, you need to put the type, then the variable name, and only then the size of the array. This will give you the desired array of char* on the stack.

Second, when you declare char* a, b, c, only a is a char*, while b and c are just char! This is made clear in cython during the compilation phase, which outputs the following warning for me:

Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

So you should do this instead:

cdef char* obj[3]
cdef char* a
cdef char* b
cdef char* c
a = b'abc'
b = b'1234'
c = b'^&$#%'
obj = [a, b, c]

As a side note, you can minimize typing cdef by doing this for your code:

cdef:
    char* obj[3]
    char* a
    char* b
    char* c
a = b'abc'
b = b'1234'
c = b'^&$#%'
obj = [a, b, c]

Bonus:

Based on your level of experience with C and pointers in general, I think I will just show the more newbie-friendly approach using C++ data structures. C++ has simple built-in data structures like vector, which is the equivalent of a python list. The C alternative would be to have a pointer to a struct, signifying an "array" of triplets. You would then be personally in charge of managing the memory of this using functions like malloc, free, realloc, etc.

Here is something to get you started; I strongly suggest you follow some online C or C++ tutorials on your own and adapt them to cython, which should be fairly trivial after some practice. I am showing both a test.pyx file as well as the setup.py file that shows how you can compile this with c++ support.

test.pyx

from libcpp.vector cimport vector

"""
While it can be discouraged to mix raw char* C "strings" wth C++ data types, 
the code here is pretty simple.
Fixed arrays cannot be used directly for vector type, so we use a struct.
Ideally, you would use an std::array of std::string, but std::array does not 
exist in cython's libcpp. It should be easy to add support for this with an
extern statement though (out of the scope of this mini-tutorial).
"""
ctypedef struct triplet:
    char* data[3]

cdef:
    vector[triplet] obj
    triplet abc
    triplet xyz

abc.data = ["abc", "1234", "^&$#%"]
xyz.data = ["xyz", "5678", "%#$&^"]
obj.push_back(abc)#pretty much like python's list.append
obj.push_back(xyz)

"""
Loops through the vector.
Cython can automagically print structs so long as their members can be 
converted trivially to python types.
"""
for o in obj:
    print(o)

setup.py

from distutils.core import setup
from Cython.Build import cythonize
from distutils.core import Extension

def create_extension(ext_name):
    global language, libs, args, link_args
    path_parts = ext_name.split(".")
    path = "./{0}.pyx".format("/".join(path_parts))
    ext = Extension(ext_name, sources=[path], libraries=libs, language=language,
            extra_compile_args=args, extra_link_args=link_args)
    return ext

if __name__ == "__main__":
    libs = []#no external c libraries in this case
    language = "c++"#chooses c++ rather than c since STL is used
    args = ["-w", "-O3", "-ffast-math", "-march=native", "-fopenmp"]#assumes gcc is the compiler
    link_args = ["-fopenmp"]#none here, could use -fopenmp for parallel code
    annotate = True#autogenerates .html files per .pyx
    directives = {#saves typing @cython decorators and applies them globally
        "boundscheck": False,
        "wraparound": False,
        "initializedcheck": False,
        "cdivision": True,
        "nonecheck": False,
    }

    ext_names = [
        "test",
    ]

    extensions = [create_extension(ext_name) for ext_name in ext_names]
    setup(ext_modules = cythonize(
            extensions, 
            annotate=annotate, 
            compiler_directives=directives,
        )
    )
CodeSurgeon
  • 2,435
  • 2
  • 15
  • 36
  • Thanks! I hope that syntax was clearer in the documentation, or maybe I missed it. – user3758232 May 18 '18 at 02:27
  • The cython documentation is pretty good for the most part. I would take a look at the "Unicode and passing strings" section of it to see more on how to convert between python str, bytes, and char*. – CodeSurgeon May 18 '18 at 02:37