echoing @errantlinguist, please be aware that parallel performance is highly application-dependent.
To maintain GUI responsiveness, yes, I would just use a separate "worker" thread to keep the main thread available to handle GUI events.
To do something "insanely parallel", like a Monte Carlo computation, where you have many many completely independent tasks which have minimal communication between them, I might try multiprocessing.
If I were doing something like very large matrix operations, I would do it multithreaded. Anaconda will automatically parallelize some numpy operations via MKL on intel processors (but this will not help you on ARM). I believe you could look at something like numba to help you with this, if you stay in python. If you are unhappy with performance, you may want to try implementing in C++. If you use almost all vectorized numpy operations, you should not see a big difference from using C++, but as python loops, etc. start to creep in, you will probably begin to see big differences in performance (beyond the max 4x you will gain by parallelizing your python code over 4 cores). If you switch to C++ for matrix operations, I highly recommend the Eigen library. It's very fast and easy to understand at a high-level.
Please be aware that when you use multithreading, you are usually in a shared memory context, which eliminates a lot of the expensive io you will encounter in multiprocessing, but it also introduces some classes of bugs you are not used to encountering in serial programs (when two threads begin to access the same resources). In multiprocessing, memory is usually separate, except for explicitly defined communications between the processes. In that sense, I find that multiprocessing code is typically easier to understand and debug.
Also there are frameworks out there to handle complex computational graphs, with many steps, which may include both multithreading and multiprocessing (try dask).
Good luck!