Optimising string generate and test

Question

I am trying to run a simulation to test the average Levenshtein distance between random binary strings.

To speed it up I am using this C extension.

My code is as follows.

from Levenshtein import distance 
for i in xrange(20):
    sum = 0
    for j in xrange(1000):
        str1 =  ''.join([random.choice("01") for x in xrange(2**i)])
        str2 =  ''.join([random.choice("01") for x in xrange(2**i)])
        sum += distance(str1,str2)
    print sum/(1000*2**i)

I think the slowest part is now the string generation. Can that be sped up somehow or is there some other speed up I could try?

I also have 8 cores but I don't know how hard it would be take advantage of those.

Unfortunately I can't use pypy because of the C extension.

halex · Accepted Answer · 2013-04-27T09:57:27.983

The following solution should be way better in terms of runtime.

It generates a number with 2**i random bits (random.getrandbits), converts it to a string of the number's binary representation (bin), takes everything beginning with the 3nd character to the end (because the result of bin is prepended with '0b') and prepends the resulting string with zeros to have the length you want.

str1 = bin(random.getrandbits(2**i))[2:].zfill(2**i)

Quick timing for your maximum string length of 2**20:

from timeit import Timer
>>> t=Timer("''.join(random.choice('01') for x in xrange(2**20))", "import random")
>>> sorted(t.repeat(10,1))
[0.7849910731831642, 0.787418033587528, 0.7894113893237318, 0.789840397476155, 0.7907980049587877, 0.7908638883536696, 0.7911707057912736, 0.7935838766477445, 0.8014726470912592, 0.8228315074311467]
>>> t=Timer("bin(random.getrandbits(2**20))[2:].zfill(2**20)", "import random")
>>> sorted(t.repeat(10,1))
[0.005115922216191393, 0.005215130351643893, 0.005234282501078269, 0.005451850921190271, 0.005531523863737675, 0.005627284612046424, 0.005746794025981217, 0.006217553864416914, 0.014556016781853032, 0.014710766150983545]

That's a speedup of a factor of 150 on average.

@marshall: you can speed it up even further using [`b2a_bin(os.urandom(2**i/8))` (C extension written in Cython)](https://gist.github.com/zed/3526111). See [Multiplying a huge number times random() (Python)](http://stackoverflow.com/q/12161988/4279) — jfs, Apr 28 '13 at 01:52

user4815162342 · Answer 2 · 2013-04-27T18:30:44.410

You can create a Python string using the Python/C API, which will be significantly faster than any method that exclusively uses Python, since Python itself is implemented in Python/C. Performance will likely primarily depend on the efficiency of the random number generator. If you are on a system with a reasonable random(3) implementation, such as the one in glibc, an efficient implementation of random string would look like this:

#include <Python.h>

/* gcc -shared -fpic -O2 -I/usr/include/python2.7 -lpython2.7 rnds.c -o rnds.so */

static PyObject *rnd_string(PyObject *ignore, PyObject *args)
{
    const char choices[] = {'0', '1'};
    PyObject *s;
    char *p, *end;
    int size;
    if (!PyArg_ParseTuple(args, "i", &size))
        return NULL;
    // start with a two-char string to avoid the empty string singleton.
    if (!(s = PyString_FromString("xx")))
        return NULL;
    _PyString_Resize(&s, size);
    if (!s)
        return NULL;
    p = PyString_AS_STRING(s);
    end = p + size;
    for (;;) {
      unsigned long rnd = random();
      int i = 31;   // random() provides 31 bits of randomness
      while (i-- > 0 && p < end) {
        *p++ = choices[rnd & 1];
        rnd >>= 1;
      }
      if (p == end)
        break;
    }
    return s;
}

static PyMethodDef rnds_methods[] = {
    {"rnd_string",  rnd_string, METH_VARARGS },
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initrnds(void)
{
    Py_InitModule("rnds", rnds_methods);
}

Testing this code with halex's benchmark shows that it is 280x faster than the original code, and 2.3x faster than halex's code (on my machine):

# the above code
>>> t1 = Timer("rnds.rnd_string(2**20)", "import rnds")
>>> sorted(t1.repeat(10,1))
[0.0029861927032470703, 0.0029909610748291016, ...]
# original generator
>>> t2 = Timer("''.join(random.choice('01') for x in xrange(2**20))", "import random")
>>> sorted(t2.repeat(10,1))
[0.8376679420471191, 0.840252161026001, ...]
# halex's generator
>>> t3 = Timer("bin(random.getrandbits(2**20-1))[2:].zfill(2**20-1)", "import random")
>>> sorted(t3.repeat(10,1))
[0.007007122039794922, 0.007027149200439453, ...]

Adding C code to a project is a complication, but for a 280x speedup of a critical operation, it might well be worth it.

For further efficiency improvement, look into faster RNGs, and invoke call them from separate threads, in order to parallelize random number generation is parallelized. The latter would benefit from a lock-free synchronization mechanism to make sure that inter-thread communication doesn't bog down the otherwise fast generation process.

It's really interesting to see that your C code is *only* a factor 3 faster than my *pure* python solution. Thought it would be better :) — halex, Apr 27 '13 at 10:00
@halex I was surprised as well! As always, the trick is to make use of Python's builtins such as `bin`. I suspect that the 3x speedup is a result of using a faster (and less sophisticated) RNG. — user4815162342, Apr 27 '13 at 10:05
@marshall You're welcome. Note that thanking each responder in turn is not a custom on StackOverflow. Instead, you are expected to upvote the answers you consider useful and, as a responder, [accept the best one](http://meta.stackexchange.com/a/5235). — user4815162342, Apr 27 '13 at 16:22
Also, feel free to provide feedback if/when you actually try some of the suggestions. I'd be curious to know whether they solve your performance issues. — user4815162342, Apr 27 '13 at 16:23
Thanks for the etiquette note however it seems polite to thank people as you won't know if I upvoted or not. I also can't downvote the clearly incorrect answer sadly. I have used the python answer. I think further speed ups will come from tuning the C implementation of the Levenshtein function now and parallelisation. — marshall, Apr 27 '13 at 17:00
@marshall Polite in everyday life, absolutely, but on StackOverflow it's superfluous, as I'm sure you'll agree after having spent more time on the site. In any case, welcome to StackOverflow. :) — user4815162342, Apr 27 '13 at 17:28

Optimising string generate and test

2 Answers2