Is there a better data structure for storing components and their associated entities?

Question

I'm writing a little Entity Component System (ECS) in Javascript (Typescript specifically) and it currently works, but I was wondering if it could be more efficient under the hood. The way an ECS works is that entities are basically just bags of components. So, a player entity might have a HealthComponent, PositionComponent, SpriteComponent, etc. Then you can create a RenderingSystem that queries all entities with a PositionComponent and a SpriteComponent and then it renders them. Like this:

for (let entity of scene.query(CT.Position, CT.Sprite) {
  // draw entity
}

To make this efficient when querying, rather than iterating through every entity in the scene to see if it has a Position component and a Sprite component every time, what we instead do is that cache it after the first query call and then keep it updated, so every query call can just return us the list of entities, rather than iterating through the entire list of all entities first each time.

So, as an example, the cache might look like this:

{ "6,1,20" => Map(1) }
{ "2,3,1,6" => Map(1) }
{ "2,3" => Map(31) }
{ "9" => Map(5) }
{ "2,8" => Map(5) }
{ "29,24,2" => Map(5) }

// etc..

The numbers refer to the value of the enum values like CT.Position, CT.Sprite, etc. In this case, CT.Position is 2 and CT.Sprite is 3, and there are 31 entities that have those two components. So when querying all entities that have those two components, we can just return that list of entities, rather than computing it each time.

This all works, but it's not very efficient, because adding (and removing!) an entity to the scene is an O(n) operation and also involves a lot of string splitting and concatenation. You need to iterate through every item in the cache to see if the entity's list of components is included by that entry.

Is there any way to improve this to be more like O(log n) or preferably O(1)? Let me know if this is all clear, or if there's any details that need to be clarified.

Here's a link to the Typescript Playground URL reproduction example

Could you provide a [mcve] of a bare-bones implementation suitable for dropping into a standalone IDE to demonstrate how this works and what operations you need to support? — jcalz, Mar 03 '21 at 01:59
@jcalz I was able to make an example but I don't know how to share it. I wrote some code on the Typescript Playground website, but the URL is too long to share it here. Also, I tried to use a URL shortener but they _also_ say the URL is too long to shorten. — Ryan Peschel, Mar 03 '21 at 02:22
Here, I made a pastebin that holds a URL to the playground (this is insanity..): https://paste.ee/p/cissL — Ryan Peschel, Mar 03 '21 at 02:23
You should be able to edit that into your question and not just put it in comments, I'd think. I'm not sure if it makes a lot of sense to cache particular queries if you plan on adding/removing entities frequently; I guess it really depends on usage patterns. If you're not adding/removing entities frequently then it might be reasonly performant to throw away your cache when this happens, instead of trying to repair the cache line by line. In any case I'd be reluctant to make any strong claims about performance without profiling typical usages. — jcalz, Mar 03 '21 at 02:36
I don't add or remove entities that frequently. Querying probably happens 1000x more often than entity creation / deletion, but there are times where I would like to add or remove a large number of entities in a short period of time, and I've noticed frame-rate drops during this time, which is why I'm looking to hopefully reduce the time complexity of my add and remove operations. — Ryan Peschel, Mar 03 '21 at 02:40
I was working on my own bare-bones thing that deals with "attributes" (just numbers) and not "components" (with more structure to them) because I'm focusing on just the query/add/remove operations: see [here](https://tsplay.dev/wXRn1m). The approach here is only to cache queries of a single attribute as `Set`s, and then intersect these sets when you query multiple attributes. Really can't say much about performance in practice, though. If you're interested in the approach, let me know and I can write up an answer with explanation; otherwise hopefully someone else has an idea. Good luck! — jcalz, Mar 03 '21 at 03:07
Interesting snippet I'll have to read it over. Yeah, I recently realized my question is basically "how do I quickly find the intersection between N sets?" so I was googling around. Apparently bloom filters might work here? I'm reading about it now.. Not sure if I'm understanding properly but it looks like you might be able to do a set intersection on N bloom filters to find the matching items. https://stackoverflow.com/a/39176090/962155 — Ryan Peschel, Mar 03 '21 at 03:15
So @RyanPeschel do you want an answer with intersections-of-sets or not? Can you give an estimate of the approximate number of objects of each type you're dealing with and the frequency of each operation? I have a hard time thinking bloom filters will be better than hash-maps or hash-sets for anything where all the data fits in memory. Have you tried [timing set intersections](https://tsplay.dev/Nn6nVN)? — jcalz, Mar 03 '21 at 16:46
Hey @jcalz sorry about the delayed response I was sleeping. Yeah so I've been thinking about this some more and I'm not sure if there is really any way to optimize this further. The query operation HAS to be as fast as possible because it's run 10,000 times more often than add or remove operations. So my current query operation is probably as fast as it could be considering it just does a cache lookup of the key and then returns the list of all the entities. It's a simple `cache[key] -> Array` call which is ideal. — Ryan Peschel, Mar 03 '21 at 17:10
@jcalz It would be ideal if it were possible to maintain that while also improving the performance of the add and remove operations, but I'm not sure if that could be done. I've been racking my brain trying to think of a way to improve the add / remove performance while also maintaining the query performance, but I just can't think of anything. — Ryan Peschel, Mar 03 '21 at 17:11
Actually, I think I might have had an idea just now on how to improve the add / remove performance while maintaining the query performance, but I still need to think through it. What if we introduced another cache that mapped component lists to which caches they're added to? Basically, a cache mapping for the cache. Then for every unique grouping of components, you'd only need to do the O(n) computation the first time, and then all times after that would just be a simple lookup. — Ryan Peschel, Mar 03 '21 at 18:06
Is it true that entities never gain or lose components over their lifetimes? — jcalz, Mar 03 '21 at 18:31
@jcalz No that's not true. You can add or remove components from an entity at runtime. — Ryan Peschel, Mar 03 '21 at 18:33
So far I have [this](https://tsplay.dev/Nd3Znw) but I'm pretty sure it's still doing O(n*k) on add/remove operations. Oh well — jcalz, Mar 03 '21 at 22:25

score 0 · Answer 1 · answered Mar 03 '21 at 14:50

I expect that the number of queries in the cache will be pretty small, since each query will be individually tied to a bunch of code that processes the results. So iterating over the query list and performing some operation for each one won't be that expensive, but if you have problems when adding or removing a whole bunch of entities, then that can certainly be addressed.

First the string representation you use for a subset of component types is indeed pretty inefficient. There are lots of alternatives. Maybe try something like this:

First, assign an integer to each component type (you did this already)
Sort the component types in the subset by their integer
Build a string using each integer as a character

This representation is not too fancy, but it allows you to quickly get at the component types in a subset using charCodeAt(), and you can use that to test for subsets by walking through both strings simultaneously, or by walking through one while doing a binary search in the other.

The real improvements, however, would come from grouping entities by the subset of component types that they present. There are lots of ways. I think something like this would work for you:

For each entity, precalculate its component-type-subset string
For each subset in use, maintain the list of cached queries that match that subset. This list only needs to be modified when you introduce a new query or a new subset.
When an entity is added or removed, get the queries for its subset and add or remove it directly to/from the results.
When you get a new query, make a set of the subsets it matches, add it to the query list for those subsets, and check each entity to see if its subset is contained in the match set.

Does this work even if integers can be in the double digits? I have around 100 components. — Ryan Peschel, Mar 03 '21 at 17:13

score 0 · Answer 2 · answered Mar 03 '21 at 19:48

Okay, I think I have a tentative answer to this, in that it seems to be working, but the code is very complex for me to understand, so I'm not sure if this is actually working or if it just seems to be and is actually broken.

So, for the solution, I wanted to maintain the query performance because querying is called for every system for every frame update, so it's executed 1000x more often than entity creation / deletion. Currently querying works as an amortized O(1) algorithm by first checking if the cache contains this mapping of components. If it doesn't, it creates this list of entities associated with this grouping of components (archetype), and then henceforth that list is fetched from the cache. The cache is always kept up-to-date.

The issue in my question was that while it was nice to have an O(1) query operation, it would be desired to have more efficient add and remove operations, as they were O(n*k), where n was the number of distinct query operations (members of the cache) and k was the number of components in the entity. That is, whenever an entity was created or destroyed, the program would have to iterate through each item in the cache and check if the entity should belong to this query operation. If so, add it to that set, and if not, remove it.

The idea I had this morning was to implement another cache / mapping. That is, the original cache mapped from a query component listing (archetype) to the set of entities that held those components. Example:

{ "6,1,20" => Set(1) }
{ "2,3,1,6" => Set(1) }
{ "2,3" => Set(31) }

Let's say 2 referred to the PositionComponent and 3 referred to the SpriteComponent. This means that all entities that contain those two components can be found within that set of 31 entities.

So, my tentative solution to my original question was to also have a mapping where a list of components corresponds to all cache entries they're a member of. That is, say we have an entity with the following components: 1, 2, 3, 6, 25. Then its corresponding entry in this cache would look like this:

1,2,3,6,25 => [ "2,3,1,6", "2,3" ]

The first time an entity of that archetype (component listing) is constructed, that list is manually created. However, afterwards it is simply maintained. Then, whenever there is a request to create an entity of that archetype, we can simply query this cache to find out which cache entries we need to modify.

That way, instead of having to iterate through the entire cache and then iterate through each cache item to determine if it should be a member, instead we simply query our secondary cache to determine which cache entries it is a member of. So, I believe the amortized complexity shrinks from O(n*k) to O(c), where c is the number of cache entries it's a member of.

Is there a better data structure for storing components and their associated entities?

2 Answers2