This is a rather complicated and extensive problem to discuss. The problem is, using multi-threading, only gets you so far. Why? Because the standard implementation of Python, CPython, uses something known as the GIL (Global Intrepreter Lock), that is a Mutex that prevents multiple threads from executing Python bytecodes at once, preventing multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. It has been stated, that this is not a big problem when tasks executing in multiple threads are I/O Bound (it means that most of the time they are waiting for an input-output result such as a call to a service through the network or reading a file from the disk or consulting a Database) because in that case, the corresponding thread releases the GIL, allowing another thread to be executed; but in the end, only one thread will be executed at a time. Also, the indiscriminate creation of threads has its own perils, because context switching of them by the CPU when you have created a lot of threads is a very expensive task. Because of that, some times is better to use async programming, that only use one execution thread; but unfortunately when the tasks to be executed are CPU bound, like in your case, async programming is not of much utility.
Taking into account the above, and that the work you need to execute is CPU bound (since it is an intensive operation that does not depend on I/O), what would improve the execution speed, is using multiprocessing and/or distributed computing. The benefits obtained in execution speed, by using multiprocessing, will depend on how many cores or CPUs your machine has. Also you must take into account that the communication between processes will also insert its own delays. Using distributed computing would be the best option (as long as you have enough PCs in a cluster to handle the workload). In any case, one of the challenges that you will face, is how to distribute the work load among the different processes/machine in the cluster.
There're several ways of multiprocessing and distributed computing in Python, the following are some references for your further research:
Speed Up Your Python Program With Concurrency
What Is the Python Global Interpreter Lock (GIL)?
Modern Parallel and Distributed Python: A Quick Tutorial on Ray