I have a program where I need to compile several thousand large regexes, all of which will be used many times. Problem is, it takes too long (according to cProfiler
, 113 secs) to re.compile()
them. (BTW, actually searching using all of these regexes < 1.3 secs once compiled.)
If I don't precompile, it just postpones the problem to when I actually search, since re.search(expr, text)
implicitly compiles expr
. Actually, it's worse, because re
is going to recompile the entire list of regexes every time I use them.
I tried using multiprocessing
, but that actually slows things down. Here's a small test to demonstrate:
## rgxparallel.py ##
import re
import multiprocessing as mp
def serial_compile(strings):
return [re.compile(s) for s in strings]
def parallel_compile(strings):
print("Using {} processors.".format(mp.cpu_count()))
pool = mp.Pool()
result = pool.map(re.compile, strings)
pool.close()
return result
l = map(str, xrange(100000))
And my test script:
#!/bin/sh
python -m timeit -n 1 -s "import rgxparallel as r" "r.serial_compile(r.l)"
python -m timeit -n 1 -s "import rgxparallel as r" "r.parallel_compile(r.l)"
# Output:
# 1 loops, best of 3: 6.49 sec per loop
# Using 4 processors.
# Using 4 processors.
# Using 4 processors.
# 1 loops, best of 3: 9.81 sec per loop
I'm guessing that the parallel version is:
- In parallel, compiling and pickling the regexes, ~2 secs
- In serial, un-pickling, and therefore recompiling them all, ~6.5 secs
Together with the overhead for starting and stopping the processes, multiprocessing
on 4 processors is more than 25% slower than serial.
I also tried divvying up the list of regexes into 4 sub-lists, and pool.map
-ing the sublists, rather than the individual expressions. This gave a small performance boost, but I still couldn't get better than ~25% slower than serial.
Is there any way to compile faster than serial?
EDIT: Corrected the running time of the regex compilation.
I also tried using threading
, but due to GIL, only one processor was used. It was slightly better than multiprocessing
(130 secs vs. 136 secs), but still slower than serial (113 secs).
EDIT 2: I realized that some regexes were likely to be duplicated, so I added a dict for caching them. This shaved off ~30 sec. I'm still interested in parallelizing, though. The target machine has 8 processors, which would reduce compilation time to ~15 secs.