You can create a Python string using the Python/C API, which will be significantly faster than any method that exclusively uses Python, since Python itself is implemented in Python/C. Performance will likely primarily depend on the efficiency of the random number generator. If you are on a system with a reasonable random(3) implementation, such as the one in glibc, an efficient implementation of random string would look like this:
#include <Python.h>
/* gcc -shared -fpic -O2 -I/usr/include/python2.7 -lpython2.7 rnds.c -o rnds.so */
static PyObject *rnd_string(PyObject *ignore, PyObject *args)
{
const char choices[] = {'0', '1'};
PyObject *s;
char *p, *end;
int size;
if (!PyArg_ParseTuple(args, "i", &size))
return NULL;
// start with a two-char string to avoid the empty string singleton.
if (!(s = PyString_FromString("xx")))
return NULL;
_PyString_Resize(&s, size);
if (!s)
return NULL;
p = PyString_AS_STRING(s);
end = p + size;
for (;;) {
unsigned long rnd = random();
int i = 31; // random() provides 31 bits of randomness
while (i-- > 0 && p < end) {
*p++ = choices[rnd & 1];
rnd >>= 1;
}
if (p == end)
break;
}
return s;
}
static PyMethodDef rnds_methods[] = {
{"rnd_string", rnd_string, METH_VARARGS },
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initrnds(void)
{
Py_InitModule("rnds", rnds_methods);
}
Testing this code with halex's benchmark shows that it is 280x faster than the original code, and 2.3x faster than halex's code (on my machine):
# the above code
>>> t1 = Timer("rnds.rnd_string(2**20)", "import rnds")
>>> sorted(t1.repeat(10,1))
[0.0029861927032470703, 0.0029909610748291016, ...]
# original generator
>>> t2 = Timer("''.join(random.choice('01') for x in xrange(2**20))", "import random")
>>> sorted(t2.repeat(10,1))
[0.8376679420471191, 0.840252161026001, ...]
# halex's generator
>>> t3 = Timer("bin(random.getrandbits(2**20-1))[2:].zfill(2**20-1)", "import random")
>>> sorted(t3.repeat(10,1))
[0.007007122039794922, 0.007027149200439453, ...]
Adding C code to a project is a complication, but for a 280x speedup of a critical operation, it might well be worth it.
For further efficiency improvement, look into faster RNGs, and invoke call them from separate threads, in order to parallelize random number generation is parallelized. The latter would benefit from a lock-free synchronization mechanism to make sure that inter-thread communication doesn't bog down the otherwise fast generation process.