6

I wonder how to implement the fastest version of an entity component system (ECS from now on) in C++.

First off, regarding terminology:

  • a Scene is a container for Entities (and Systems in some implementations)
  • a Component is a simple data storage (like position, collision box, image to render etc)
  • a System performs logic on Components fitting the System's requirements (this could be physics, player input, simple rendering etc.)
  • an Entity contains several Components to make up the final behavior

I listed all the designs we came up with below.


1. The "naive" way

The Scene contains all Entities unordered.
As the Systems update, every System has to loop through all Entities and check whether each Entity contains all required Components and then perform the update upon those Entities.

Obviously, this way is not too performant when having a lot of Systems and/or a lot of Entities.


2. Using "bitmask enums" and mapping

Each Component contains a type identifier in the form of a bitmask (e.g. 1u << 5 / binary [0...]100000). Each Entity can then compose all Component's type identifiers (assuming all typeIDs are unique inside the Entity), so it looks something like

1u << 5 | 1u << 3 | 1u << 1
binary [0...]101010

The Scene contains some kind of map where Systems can easily look up fitting Entities:

MovementSystem::update() {
    for (auto& kv : parent_scene_) { // kv is pair<TypeID_t, vector<Entity *>>
        if (kv.first & (POSITION | VELOCITY))
            update_entities(kv.second); // update the whole set of fitting entities
    }
}

Pros:

  • Faster than the naive way

Cons:

  • Systems have to look up appropriate Entities every single time they are updated.
  • A bitmask (enum) is limited to a number of bits (32 for uint32_t, at least 64 for unsigned long long) and in some cases you might need more Components than the bitmask allows.

3. Using no Systems

This method is described by Danvil in an answer below.

Pros:

  • Gets rid of the bitmask thing completely.
  • Likely to be faster than design #2.

Cons:

  • Relies on dynamic_cast for looking up a component whereas design #2 can directly look up a component and then safely static_cast it.

4. Using spare sets

This method has been described in by skypjack in an answer below. He explained his approach in great detail, so I'd suggest you read his answer.

Terrance
  • 11,764
  • 4
  • 54
  • 80

2 Answers2

4

Another approach that I found pretty promising and I've used in a project of mine (see EnTT on GitHub) is based on sparse sets.
I used them within the component pools, to keep track of which entity has a component associated and what's its slot.

Major benefits are:

  • You get for free a small array of all the entities that have the specific component (see here for further details) and this gives you a great boost in terms of performance when you iterate over them.

  • It keeps at a minimum the number of components actually assigned. Moreover, components are all kept compact in memory.

  • Cache misses are reduced at a minimum, for you pay only for what you use. In other terms, you get only those components that are actually assigned to an entity and all of them are near to each other in memory, no holes at all.

You get it at the price of an extra array with max length equal to the number of entities for each component pool (note that usually those arrays are smaller in a real world software).

Benchmarks shown that performance are far better than the ones of a well known entity component system based on bitmask descriptors (see the link above for further details). I've also verified that memory pressure is more or less the same, for you get rid of the array of bitmask descriptors but you introduce a set of mini arrays within the component pools.

Iterations over sets of entities when looking for multiple components can be highly improved too with a trick: find the shortest set and iterate over its entities (a really fast operation), then verify if the n-th entity has the other components and eventually return it.
Benchmarks proved that it's still faster than the bitmask based design on dense sets (where each entity has all the components). In case of sets not so dense (that is a reasonable assumption for a real world software), performance are definitely better than bitmask based solutions.

Finally, differently from solution #4, no dynamic cast is required in this case.

The whole thing gives you something I'd call an entity component registry. Systems can be defined as lambdas that capture the registry or functors to which you can pass the registry. There is no need to register systems with the registry itself.


I hope you got the idea behind this implementation.
If you need more details, feel free to ask.

skypjack
  • 49,335
  • 19
  • 95
  • 187
  • I really like this approach. I wonder, to improve cache coherency even further, have you experimented with "materialized joins"? The way I look at an ECS is that it is essentially a traditional database layout. Every component is a table of properties, indexed by entity ID as the primary key. The physics system would for example join the position and velocity tables (i.e. components) to do its processing. You describe an efficient way of performing this join as a hash join. Maybe this joined result could be cached between frames? – BuschnicK Mar 11 '19 at 14:14
  • 1
    @BuschnicK I've implemented a sort of materialized views with the [group functionality](https://github.com/skypjack/entt/wiki/Crash-Course:-entity-component-system#groups) of [`EnTT`](https://github.com/skypjack/entt). If you're interested in the topic, I'm also writing a series of posts about the different solutions ([here](https://skypjack.github.io/2019-02-14-ecs-baf-part-1/) is part 1 and [here](https://skypjack.github.io/2019-02-14-ecs-baf-part-1/) is part 2). More will come in future ofc. – skypjack Mar 11 '19 at 14:21
  • Thanks. That is very useful information. The group functionality sounds exactly like what I had in mind - although the documentation mainly states how to use it and not how it's implemented. Is that a blog post you have planned? I'd be very curious about your solution because I have a hard time seeing how one could do materialized views when systems are allowed to mutate components. I could see how one might sort the individual component sparse arrays so that joins between certain sets of components become more efficient. – BuschnicK Mar 11 '19 at 15:32
  • 1
    @BuschnicK the blog post (part 2) already contains some details on the group functionality. Probably I'll write more about it in future. Nice to know someone is interested. ;-) – skypjack Mar 11 '19 at 15:37
  • 1
    @BuschnicK [this](https://skypjack.github.io/2019-03-21-ecs-baf-part-2-insights/) is what you were looking for. Let me know your thoughts. ;-) – skypjack Mar 21 '19 at 15:08
  • 1
    Yes, thank you. Already read it via my RSS Reader before you posted here – BuschnicK Mar 22 '19 at 20:24
1

I would say what you call a "System" is actually a component. An example for rendering: there is a component Pose (for 3D location rotation) and a component Mesh (holds vertex buffers). Now instead of having a function which checks if it can render that particular entity, add a component Renderer. This component connects to the Pose and Mesh components. The "System" rendering now only has to communicate with the component Renderer. And each entity is either renderable or it is now, there is not need for checking components each time and all work for rendering is collected as a component.


Code example:

struct Pose : public Component { float x,y; };

struct Mesh : public Component { std::vector<Vertex> vertices; };

struct Renderer : public Component {
   Entity* entity;
   void render() {
       if(!mesh|| entity->componentsChanged) {
           mesh = entity->getComponent<Mesh>();
           if(!mesh) throw error;
       }
       if(!entity->pose) throw error;
       glTranslate(entity->pose->x, entity->pose->y);
       ...
   }
private:
   Mesh* mesh;
};

struct Entity {
    std::vector<Component*> components;
    bool componentsChanged;
    template<typename C> C* getComponent() const {
        for(Component* c : components) {
            C* cc = dynamic_cast<C>(c);
            if(cc) return cc;
        }
        return NULL;
    }
    // "fast links" to important components
    Pose* pose;
    Renderer* renderer;
    PhysicsStuff* physics;
};

struct Rendering
{
private:
    void render(const std::vector<Entity*>& entities) {
        for(Entity* e : entities) {
            if(!e->renderer) continue;
            e->renderer->render();
        }
    }
};    
Danvil
  • 22,240
  • 19
  • 65
  • 88
  • I added this design to the list above. It is an alternative to design #3 certainly, yet I'm not sure how it will change code clarity. Also, what is going to happen if there are no "fast links" for certain components? You would have to look them up which results in a lot of `dynamic_cast`ing again. Inside Rendering::render(...) for example, if there was no Entity::renderer link, you would have to find out whether that exists for _every single_ Entity. (Hope you get the point as I'm not quite sure if that was well expressed) –  May 19 '14 at 18:59
  • That's true, but it is not necessary to do this for every call to "render". See the example of `mesh` in `Renderer`. The code is verbose/incomplete, but you should get the point. Alternatively you could have a function `OnComponentsChanged` to notify components that the components of the entity changed and that links need to be updated. – Danvil May 19 '14 at 22:03
  • Yeah I got that for sure, that doesn't answer the second part of my comment though (regarding the loop stuff) –  May 20 '14 at 10:20
  • @MartinD.: The "fast links" act a bit like your bools. You would need one for every system. If this is too intrusive for some reason, then the systems themselves can manage the links (like in #3). But this would require some additional code for adding/removing/changing entities. – Danvil May 20 '14 at 15:59
  • Good point. Then, again, this would pretty much resemble #3, with the difference that components contain their functionalities themselves while "systems" are used as a calling mechanism only. –  May 20 '14 at 18:43
  • 1
    @MartinD. I think this would be a good option for a first prototype. Just don't use bitmasks. As you noted, they do not scale at all. – Danvil May 20 '14 at 21:56