3

Is there a way to use AES-NI instructions within Cython code?

Closest I could find is how someone accessed SIMD instructions: https://groups.google.com/forum/#!msg/cython-users/nTnyI7A6sMc/a6_GnOOsLuQJ

AES-NI in Python thread was not answered: Python support for AES-NI

Community
  • 1
  • 1
ArekBulski
  • 4,520
  • 4
  • 39
  • 61

1 Answers1

2

You should be able to just define the intrinsics as if they're normal C functions in Cython. Something like

cdef extern from "emmintrin.h": # I'm going off the microsoft documentation for where the headers are
    # define the datatype as an opaque type
    ctypedef struct __m128i:
        pass

    __m128i _mm_set_epi32 (int i3, int i2, int i1, int i0)

cdef extern from "wmmintrin.h":
    __m128i _mm_aesdec_si128(__m128i v,__m128i rkey)

# then in some Cython function
def f():
   cdef __m128i v = _mm_set_epi32(1,2,3,4)
   cdef __m128i key = _mm_set_epi32(5,6,7,8)
   cdef __m128i result = _mm_aesdec_si128(v,key)

The question "how do I apply this over a bytes array"? First, you get a char* of the bytes array. Then just iterate over it with range (being careful not to run off the end).

# assuming you already have an __m128i key
cdef __m128i v
cdef char* array = python_bytes_array # auto conversion
cdef int i, j

# you NEED to ensure that the byte array has a length divisible by
# 16, otherwise you'll probably get a segmentation fault.
for i in range(0,len(python_bytes_array),16):
    # go over in chunks of 16
    v = _mm_set_epi8(array[i+15],array[i+14],array[i+13],
            # etc... fill in the rest 
            array[i+1], array[i])

    cdef __m128 result = _mm_aesdec_si128(v,key)

    # write back to the same place?
    for j in range(16):
        array[i+j] = _mm_extract_epi8(result,j)
DavidW
  • 29,336
  • 6
  • 55
  • 86
  • Can that produce a loop that decodes an array without a bunch of function-call overhead inside each iteration? If not, you probably need to write encode/decode C functions. – Peter Cordes Jul 13 '16 at 19:47
  • Yes - I believe so. Cython can generate C code that can iterate over arrays and call C functions with speeds pretty comparable to C. (And when the C compiler sees those "function calls" it should translate them directly to a single processor instruction (hopefully!) rather than actually doing a function call). – DavidW Jul 13 '16 at 20:16
  • Looks fantastic. Now how do I apply this over a bytes (Python 3) object? – ArekBulski Jul 13 '16 at 22:21
  • 1
    Doing `_mm_extract_epi8` 16 times looks horrible. If you're lucky, that might compile to a single unaligned store, but if not it'll be some nasty bloated code. – Peter Cordes Jul 14 '16 at 21:07
  • 1
    @PeterCordes I agree! I'm certain it could be done much quicker. I really just wanted to answer the original question of "how do I use AES-NI instructions in Cython?" I'm happy to admit that using the instructions efficiently (or even knowing what they actually do...) is beyond my knowledge. – DavidW Jul 14 '16 at 21:15
  • @DavidW: ok, fair enough. I don't know anything about Cython, or else I'd be able to answer this. >.< I also haven't written AES encode/decode functions, but I think it's not *quite* as simple as a single instruction. There's at least a key setup instruction. How to string together intrinsics to implement AES is something you can google easily, though. To load or store 16 bytes at a time, use `_mm_loadu_si128` or `_mm_storeu_si128` on a 16-byte object. The trick is probably communicating to Cython the fact that you're accessing that many bytes. – Peter Cordes Jul 15 '16 at 00:10
  • `ctypedef struct __m128i x:` gives me a `Syntax error in struct or union definition` – Thomas Ahle Feb 11 '21 at 12:58
  • 1
    @ThomasAhle I think delete the `x`. I've no idea why I put it there originally but I suspect it was a mistake. It compiles (and makes sense) for me without the `x` – DavidW Feb 11 '21 at 18:23