I'm writing an N-Body simulation, and for computational simplification I've divided the whole space into a number of uniformly-sized regions.
For each body, I compute the force of all other bodies in the same region, and for the other regions I aggregate the mass and distances together so there's less work to be done.
I have a List<Region>
and Region
defines public void Index()
which sums the total mass at this iteration.
I have two variants of my Space.Tick()
function:
public void Tick()
{
foreach (Region r in Regions)
r.Index();
}
This is very quick. For 20x20x20 = 8000 regions with 100 bodies each = 800000 bodies in total, it only takes about 0.1 seconds to do this. The CPU graph shows 25% utilisation on my quad-core, which is exactly what I would expect.
Now I write this multi-threaded variant:
public void Tick()
{
Thread[] threads = new Thread[Environment.ProcessorCount];
foreach (Region r in Regions)
while (true)
{
bool queued = false;
for (int i = 0; i < threads.Length; i++)
if (threads[i] == null || !threads[i].IsAlive)
{
Region s = r;
threads[i] = new Thread(s.Index);
threads[i].Start();
queued = true;
break;
}
if (queued)
break;
}
}
So a quick explanation in case it's not obvious: threads
is an array of 4, in the case of my CPU. It starts off being 4xnull
. For each region, I loop through all 4 Thread
objects (which could be null
). When I find one that's either null
or isn't IsAlive
, I queue up the Index()
of that Region
and Start()
it. I set queued
to true
so that I can tell that the region has started indexing.
This code takes about 7 seconds. That's 70x slower. I understand that there's a bit of overhead involved with setting up the threads, finding a thread that's vacant, etc. But I would still expect that I would have at least some sort of performance gain.
What am I doing wrong?