Get a random boolean string in cython

Question

I have the following bit of cython code:

cdef int[:] random_binary_string
random_binary_string = np.random.choice(a=np.array([0, 1]), size=num_bits)

The annotator highlights the second line, indicating to me that I should replace it with something in pure c. My question is about what the right way to do this is.

Solutions I found:

I can do something like the following:

from libc.stdlib cimport rand
convert_to_binary_representation(rand())

Here convert_to_binary_representation is a function I would write that would take the 32 bit integer and convert it to binary form as an array of ints valued in 0 and 1. I'd then glue these together to get the string of the desired size. While that works, I suspect it's possibly not the right answer given that I'm trying to make this code as fast as possible.

I also found this question: Canonical way to generate random numbers in Cython Based on the answer there, my guess is that the right thing to do is to wrap some tool from the C++ library . Perhaps this one: https://en.cppreference.com/w/cpp/numeric/random/independent_bits_engine ? I've gotten kind of lost on how to do this -- hopefully learning more c++ will help, but until then...

Question: What is the right way to replace that numpy call?

Further information on how I am using random_binary_string:

This is for a Monte Carlo calculation for a scientific computing project, I don't need cryptographic security.
Each bit represents a choice of whether or not to include a certain element in a set.

If you're making large arrays then Numpy will be pretty good. It'll be highlighted because there's Python overhead in calling it, but internally it'll run in C and the calling overhead will be small in comparison to the work it's doing. — DavidW, Jan 20 '21 at 07:37
@DavidW Thanks. These arrays are not so big compared to the number of times I call them. Maybe a reasonable thing to do would be to make one call to numpy to get a very long array of random bits, and then access it piece by piece. I'll try that out. — Areawoman, Jan 20 '21 at 09:09
Oh, I guess also the implication is that for complicated subroutines I should profile the code to see what's actually slowing it down, rather than going by the annotation highlights alone. — Areawoman, Jan 20 '21 at 09:27
Yes - that (i.e. the single call to numpy for a big array) is probably the simplest approach. And profiling is always good. The annotation highlights are a useful tool, but they don't distinguish between a slow line that's called once and a slow line in a huge loop for example — DavidW, Jan 20 '21 at 09:29

Get a random boolean string in cython

0 Answers0