If you are using the CPython (or PyPy) implementation of Python, then the global interpreter lock (GIL) will prevent more than one thread from operating on Python objects at a time.
So if you are using such an implementation, you'll need to use multiple processes instead of multiple threads to take advantage of your 32 processors.
You could use the the standard library's multiprocessing or concurrent.futures modules to spawn the worker processes. There are also many third-party options. Doug Hellman's tutorial is a great introduction to the multiprocessing module.
Since you only need read-only access to the data structure, if you assign the complex data structure to a global variable before you spawn the processes, then all the processes will have access to this global variable.
When you spawn a process, the globals from the calling module are copied to the spawned process. However, on Linux, which has copy-on-write, the very same data structure(s) is used by the spawned processes, so no extra memory is required. Only when a process modifies the data structure is it copied to a new location.
On Windows, since there is no fork
, each spawned process calls python and re-imports the calling module, so each process requires memory for its own separate copy of the huge data structure. There must be some other way to share data structures on Windows, but I'm unaware of the details. (Edit: POSH may be a solution to the shared-memory problem, but I haven't tried it myself.)