I have a file with several thousand records and a list of regular expressions. I’d like to take each record in the file in turn and evaluate it against my list of regular expressions to a point where a match if found.
I created a single threaded script and it does the job but is very slow. To make it multithreaded I made the following adjustments:
- Created the
run_target()
function that is be passed to the Thread constructor - Created 5 worker threads
- Added the target function to the
check_file()
function.
Question: run_target()
takes 2 arguments that I pass to it with each iteration of the while
loop in the check_file()
function. Do I need to somehow pass the arguments to the constructor when I create worker threads or shall I leave it as default? Or, should I pass keyword arguments with default values?
Also, is there a better or smarter way to tackle this. Thanks in advance.
def run_target(key, expr):
matchStr = re.search(expr, key, re.I)
if matchStr:
return 1
else:
return 0
for i in range(number_of_threads):
worker = Thread(target = run_target(), args = ())
worker.daemon = True
t.start()
def check_file():
for key, value in data.items():
while True:
expr = q.get()
result = run_target(key, expr)
if result == 1:
lock.acquire()
print ‘Match found’
lock.release()
break
q.task_done()
q.join()