I need some help with the code below.
GOAL:
I have a huge ".txt" file and need a script to search a string given as input by the user.
In this specific code I'm trying to speed things up by creating multiple threads (8, to be exact) and assigning slices of the ".txt" file for each thread, so each thread will search for the results in only a portion of the file.
Since I need the return value for the search function I created and that is passed to the threads, I used the "ThreadWithReturnValue" Class shown in this question here:
How to get the return value from a thread in python?
PROBLEM:
The problem is that I only get the result if it located in the FIRST thread. The results from the other threads are always empty lists.
If I create loggings inside the search function, the loggings appears in the terminal for the first thread only, while the loggings for the other threads do not show up. This makes me believe that the other threads are not even running, even though the thread list shows up the correct number of created threads.
HELP: I'm new to Python and have very little experience with threads. I'm very probably missing some basic thing here. If you can help me I would be immensely grateful.
NOTE: I would like to make the current code work; I know there are other options to accomplish the same task, but I would like to make this one work in the first place.
#! python3
#Search string in huge ".txt" file
import pprint
from threading import Thread
#Search function that takes arguments for searched value, file object, start line and end line of search
def searchFile(value, fileObj, linestart, lineend):
results = []
textToSearch = fileObj.readlines()[linestart:lineend]
for line in textToSearch:
if value in line:
results.append(line[0:-1])
return results
#Thread class that actually return the function's return value
class ThreadWithReturnValue(Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs={}, Verbose=None):
Thread.__init__(self, group, target, name, args, kwargs)
self._return = None
def run(self):
print(type(self._target))
if self._target is not None:
self._return = self._target(*self._args,
**self._kwargs)
def join(self, *args):
Thread.join(self, *args)
return self._return
#File variables
fileToSearch = 'somefile.txt'
totalLines = 1000000 #Let's suppose the file has 1 million lines
sizeThread = totalLines // 8 #Each thread will cover 1/8 of the file's lines
#Main program loop
while True:
#Ask user for input
userinput = input('Enter here your input: /n')
#Open file
SearchFileObj = open(fileToSearch)
#Create empty search results list
searchResults = []
#Create empty Threads list
searchThreads = []
#Create 8 Threads
for i in range(0, totalLines, sizeThread):
ThreadlLineStart = i
ThreadlLineEnd = i + sizeThread
#Create Thread Objects
threadObj = ThreadWithReturnValue(target=searchFile, args=[userinput, SearchFileObj, ThreadlLineStart, ThreadlLineEnd])
searchThreads.append(threadObj)
threadObj.start()
#Wait Threads to finish
for thread in searchThreads:
threadResult = thread.join()
#Verify thread result and append result (if any) to list
if threadResult == []:
continue
else:
searchResults.append(threadResult)
#Close file
fileToSearch.close()
#Give output to user
if searchResults == []:
print('No results found.')
else:
print('The results are:')
pprint.pprint(searchResults)
break