I am trying to match many string regexes into one long string and counting delimiters on every match. Using multiprocessing
to concurrently search many regex at once:
with open('many_regex', 'r') as f:
sch = f.readlines()
with open('big_string', 'r') as f:
text = f.read()
import re
def search_sch(sch,text = text):
delim_index = []
last_found = 0
for match in re.finditer(sch, text):
count_delims = len(re.findall('##', text[last_found:match.start()]))
if delim_index:
count_delims += delim_index[-1]
delim_index.append(count_delims)
last_found = match.end()
return delim_index
from multiprocessing.dummy import Pool
with Pool(8) as threadpool:
matches = threadpool.map(search_sch, sch[:100])
The threadpool.map
takes about 25s to execute, and only a single CPU core being utilised. Any idea why more cores are not being used? Also, any python library to do this fast?