I'm trying to implement a multithreaded game loop. I already did that but had to use a few locks for that, which ruined the performance. After researching a bit I came up with this idea:
Instead of splitting the engines subsystems into different threads (e.g. physics, animation), all subsystems run on all threads. So when we got four CPUs, four threads are created, with each thread having one loop for all subsystems. So the single-core gameloop is copied on all four threads. These gameloops are controlled by one other loop, which sends messages (or 'jobs', 'tasks') to one of these threads (depending on their usage) according to user-input or scripts. This could be done with a double buffered command buffer.
Only the rendering loop is alone in a thread for maximum rendering performance. Now I'm thinking of the best way to communicate with the rendering loop. The best idea I could come up with is to again use a command buffer and swap it when the rendering loop is complete. That way the rendering loop doesn't have to wait for any of the loops and can go on rendering. If the game loop hasn't finished when the rendering loop swapped the buffer, all commands after that will be executed in the next frame of the rendering loop. To make sure that all objects will be drawn even if the game loop hasn't finished, the rendering loop holds all objects that will be drawn and draws them until it gets the command to stop drawing them.
My goal is to make the engine scalable to cpu numbers and make it to use all cores. Is this a way to do that? What is the best approach to this and how are modern engines handling this?