I've been trying to exploit parallelization to run some simulations with the MEEP simulation software a bit faster. By default the software only uses one CPU, and FDTD simulations are easily sped up by parallelization. In the end I found there was no difference between running 1 or 4 cores, the simulation times were the same.
I then figured I would instead run individual simulations on each core to increase my total simulation throughput (for example running 4 different simulations at the same time).
What I found surprising is that whenever I start a new simulation, the already started simulations would slow down, even though they run on separate cores. For example, if I run only 1 simulation on 1 core, each time step of the FDTD simulation takes around 0.01 seconds. If I start another process on another core, each simulation now spends 0.02 seconds per time step, and so on, meaning that even when I run different simulations that have nothing to do with each other on separate cores, they all slow down giving me no net increase in speed.
I'm not necessarily looking for help to solve this problem as much as I'm looking for help understanding it, because it peaked my curiosity. Each instance of the simulation requires less than 1% of my total memory, so it's not a memory issue. The only thing I can think of is the cores sharing the cache memory, or the memory bandwidth being saturated, is there any way to check if this is the case?
The simulations are fairly simple and I've ran programs which are much more memory hungry than this one and had great speedup with parallelization.
Any tips to help me understand this phenomena?