We are dealing with an extremely (over several minutes) long-running Django orm keyword text query that is causing timeouts.
For reasons we've yet to diagnose, when a keyword search is made on our postgres db containing over 1 million text records, the timeout occurs despite having implemented what we believe is the recommended approach to improving performance in such as scenario (e.g GIN indexes and to_tsvectors).
There are probably many stones yet unturned, but as a temporary fix, initial Googling points to a solution to manage/stop the hanging query using the multiprocessing module. For example:
import multiprocessing
import time
def run_problem_db_query(textFileSearchParameter, returned_statements):
# this is the db query that hangs.....
retrieved_text_files = statements.extra(where=["textfiles_tsv @@ plainto_tsquery(%s)"], params=[textFileSearchParameter])
return retrieved_text_files
retrieved_text_files = None
p = multiprocessing.Process(target=run_problem_db_query, name="run_problem_db_query", args=(textSearch, retrieved_text_files))
p.start()
time.sleep(10)
p.terminate()
p.join()
Even though a few blogs and SO posts have recommended a version of this approach, what are the potential problems (if any)?
Ultimately, we want to fix the query of course and any potential issues with the db itself, but aside from consuming a potentially-large memory footprint (as mentioned here), what else are the potential pitfalls of such an approach?