Is it too expensive to create for each time-step a new process...?
Yes, this is always expensive and often very expensive. Persistent processes, that do not make you pay the constant overhead costs per each of the time-slice processing, are a more promising option here, but many additional factors have to be also taken into account first.
All process-instantiation/termination overhead-costs are the more expensive, the less mathematically dense/complex is the task to calculate. So if your processing is cheap in [TIME]
-domain, all the overhead costs will look the more expensive ( as your processing will have to spend them many times in the row ... )
All process-instatiations will also pay remarkable overhead-costs for memory (re-)allocations for data in [SPACE]
-domain ( whereas having a feasible semi-persistent data-structures, that persistent processes can work with, an in-place matrix operators may save you a lot on memory-allocation overhead-avoidance ... very important topic on large scale matrices like in numerical mathematics processing for FEM, ANN, Video/image-kernel applications etc. )
Do not rely just on one's gut feelings.
Review all details of this logic in the re-formulated Amdahl's Law to have all the quantitative figures before deciding on this design dilemma. Benchmark each of the processing stages, including the process-instantiations, including memory-transfer costs ( in parameters ), including processing costs of the computing phase of the processing "inside" the one step forward computations, including the costs of re-distribution of results among all involved counterparties.
Only next you will be able to quantitatively decide a break-even point, after which more processes will not improve the processing ( will stop lowering the overall duration and will start add more overhead costs than the parallel-process accelerated computing may manage to cover ).
Is it better to have the process running for the whole time?
This may help a lot, once avoiding to pay the repetitive costs on process instantiation and termination.
Yet, there are costs on signalling and data re-propagation among all such computing-service processes. Next all processes have to fit inside a real RAM, so as not to lose on going swap-out/swap-in [SPACE]
-motivated tidal waves that flow indeeeeeeed very slowly and would kill all the idea of [TIME]
-motivated performance increases.
Do benchmark + vectorise + best, JIT/LLVM-compile the code. A must !
This is your strongest power for performance increases, given python is your choice. If you are serious into performance, needless to tell here more. numpy
+ numba
are just great for this. If shaving the last few [ns]
for already performant code, narrow-specialisations of calling-interfaces and better vectorisation alignments ( cache friendliness ) are your tools for this.