Unfortunately, I don't think I can offer you better than "it depends", since anything approaching idealistic is going to be relative to the design requirements of a particular engine and the sensibilities of its programmers. I might be able to offer some thoughts on the individual sections you listed though.
- A list of entities, which store components inside their structure (e.g. some sort of unordered_map<type_info, IComponent*> in C++). Systems keep track of their own list of which currently active entities possess the components they operate upon.
If you are favoring an ECS architecture design for efficiency and not just its flexibility (although there's nothing wrong with favoring an ECS just for flexibility), storing component lists directly inside of individual entities rather outside in parallel (and only one list to store all components of the same type) is generally going to run counter to that goal for reasons that go beyond the per-entity component lookup.
The procedural (or sometimes functional) paradigm of ECS is frequently contrasted to object-oriented programming. That includes performance. One of the fundamental performance disadvantages of OOP in the most critical execution paths is that object-oriented design favors interleaving/bundling data not based on optimal memory access patterns but human and SE factors for the sake of encapsulation and maintaining invariants. The ECS goes against that by immediately breaking encapsulation in favor of separating logic (systems) from data (components). Such a design immediately opens up room for more optimal data representations and memory layouts and optimizations like hot/cold splitting, but not if you just end up storing/bundling/interleaving all such data directly inside entities.
So if I were to venture towards anything resembling canonical, and exploiting the full benefits that an ECS can potentially provide, it would not store component data directly inside of entities. It would store them outside so that, for example, rarely-accessed data can be split apart and separated from commonly-accessed data rather than interleaved together and wastefully loaded into cache lines in critical paths.
- Same as 1, except systems iterate globally over all entities and check which components they contain.
That would be required of #1 from my understanding. It would be sub-optimal for similar reasons above.
There is a separate list for each type of components, and there is no
actual stored list of 'Entities' - systems iterate over their relevant
components and must somehow find the other associated components that
belong to the same entity, via some unique ID that connects them.
Keeping components in a list like this is supported as supposedly
improving cache locality (though I don't see how, since you'll still
be searching several large lists to find the related components on a
particular entity), and not having an actual 'entity' type is
supposedly a sign of a 'pure' ECS.
This is generally required to even have the possibility of approaching optimal locality of reference. One of the simplest examples is hot/cold field splitting. Consider a case like this:
struct Foo
{
// Accessed all the time every single frame.
int32_t x, y;
// Accessed hardly ever and only by the UI.
uint64_t id;
};
In this case, we will be wasting half the memory of each CPU cache line in critical execution paths sequentially iterating through an array of Foo
by loading this id
field which won't even be accessed in those paths along with x
and y
. With the ECS we can separate the commonly-accessed fields, x
and y
, away from the rarely-accessed field, id
, by creating two or more separate component types to store them and two or more component lists. The critical-execution paths won't even bother to access the list containing the rarely-accessed field. It gets even more involved if we consider SIMD vectorization and so forth, but this hot/cold field splitting example should be the easiest place to start understanding why this separation can be very beneficial from a performance standpoint.
Even when systems load more than one component type list/array, they aren't loading component type data that won't be accessed in their sequential loops. They don't load in irrelevant lists, so to speak. Also, it's not necessarily slower (actually quite often opposite the case, especially when it yields greater opportunities for vectorization) to access multiple parallel arrays in a sequential fashion. They do tend to be slower for random-access over interleaved AoS reps, but not sequential access.
- Each component type has its own global container/list, but there is still a list of entity structures which keep track of which components belong to some particular entity. Systems behave as in 1 or 2.
Archetypes are one example of something like this, although that's getting really implementation-specific. To keep things simple, I'd just start with the observation that we must, necessarily, be able to have a fine-grained separation of data in order to organize it for optimal access patterns.
I have also found some arguing in favour of "one component type per system" - which would simplify some of the challenges of system 3, but would make very little sense overall.
From my standpoint it's nonsensical. Aside from being extremely restrictive, you might as well program things in an object-oriented way at that point and have methods in your components and make their data members private and get back the benefits of encapsulation if there's only one place that accesses them. If we restrict ourselves to one component type per system, we're throwing out the benefits of encapsulation and information hiding with few benefits.