Efficient way of storing and retrieving complex objects in C++?

Question

I'm currently working on a small dungeon simulation game. The game is quite detailed and i'm planning to have, over time, +200k instances of a class that represents a "monster". They contain perks, skills and history of that monster. Things like how many potions he has used, where he lives, what are his patrol routes, etc.

I started implementing this with SQLite and using a simple table called "monsters", which contained all the data. This allowed me to use SQL queries to find the monsters needed for simulation calculation on each frame. For example: finding all monsters who patrolled point A, or finding all monsters who used Potion X, etc. Unfortunately, querying SQLite several times on every frame quickly slowed down the game. Even though it's a 2D game, i need the precious milliseconds for simulation calculations.

Also, i was going to need JOIN's in the future to do graphs: i need to know if a monster has attacked another monster or if a monster is part of the team of another monster. That would slow things even further.

Does anyone have any suggestion on how to approach this?

My data resembles something like this:

Two main techniques to increase lookup speeds are deliberately introduced redundancy, and caching. You might want to preprocess cached data to a form that's more efficiently accessed for the most common use cases. And you might think about predictive pre-fetching during times where not much else is going on. — Cheers and hth. - Alf, May 29 '16 at 04:51
Story: In engineering college my lecturer in some course didn't at first believe that my Othello program was actually computing its moves; not because they were dumb (they were so-so), but because they were apparently instant. I had to show him on a side terminal what was going on inside, that while the program waited for the user's next move, it was quite busy calculating its own response to that. — Cheers and hth. - Alf, May 29 '16 at 04:53
Presumably there will not be 200,000 monsters on-screen (or nearly-on-screen) at the same time (I know monitors are bigger now, but they aren't *that* big ;)) ... so one simple way to speed things up would be to make a smaller table (or perhaps just a plain old data structure) containing only the monsters that the user can actually see at the moment. The contents of the small table could be written back to the big table from time to time, when appropriate. — Jeremy Friesner, May 29 '16 at 05:58
try some `entity component system` library, [entityx](https://github.com/alecthomas/entityx) for example — fnc12, May 29 '16 at 07:02
Do you intend to run a query in a relational database for each frame you render? That is probably the wrong way to do it. The time scale for SQL queries is slower than that. The strengths of a relational database are flexibility, somewhat advanced searches, and large amounts of data, not very fast responses. — Thomas Padron-McCarthy, May 29 '16 at 07:05
Rather than thinking of storing *objects*, consider storing *events* or *relations*. — , May 29 '16 at 07:09
The fact that they are on the screen or not, doesn't really matter. It's a simulation game, so i have to consider changes in data on every tick. — vinnylinux, May 30 '16 at 01:37
@Hurkyl, i've considered using Event Sourcing to solve this problem, but finding the relations seems to be much more expensive than just using duplication. — vinnylinux, May 30 '16 at 01:38
*Does anyone have any suggestion on how to approach this?* - don't use SQL. Load any persistent state when the game starts, and write it to disk when the game stops. Use the in-memory data tied to some contiguous memory to support the algorithms required present much faster alternatives. — Niall, Jun 01 '16 at 14:39
I'm using SQL to find data more easily, not for persistence. I have lot's of complex queries and graph-relationship between the objects, which SQL makes very easy to write and find the correct ones. — vinnylinux, Jun 01 '16 at 14:43
You might have a container of references to monster objects for each query that you'd want to run. For example, one for monsters with a certain potion. Add and remove the references in the `has_x_effect` container as you go, so that you can easily iterate through that container. For relationships, a monster would either own a list of teammate IDs, or remember which team it belongs to. The members of each team are stored in anther structure that maps a team name to a list of monsters. (This is from a "containers in memory" perspective; I'm sure you can easily convert the ideas to tables). — Jim V, Jun 02 '16 at 20:22
I've decided to tackle this with boost::graph. I can walk the three and make questions about the relationships of nodes pretty fast. — vinnylinux, Jun 02 '16 at 21:21
If you already have something working with SQLite then perhaps you should take a look at an in memory SQL database. See http://stackoverflow.com/questions/3814609/does-anybody-have-any-experience-with-fastdb-c-in-memory-database for an example. Or perhaps compile SQLite as an in memory database https://www.sqlite.org/inmemorydb.html — Richard Chambers, Jun 07 '16 at 03:08
I think for retrieval of the data map is best possible data structure as we can access data in O(1) time only so you can use the map to store objects. — sitaram chhimpa, Aug 30 '17 at 08:56

score 8 · Answer 1 · edited Jun 02 '16 at 23:21

8

If you are not using an entity component system, consider migrating your game to using one. Each bit of related data can be stored in a self-contained component, and the entity itself is some opaque identifier that identifies the component. When using an ECS, rather than have a game entity with a bunch of data hanging off it, you reverse the relationship. All components of a specific type exist in a big pool together, and your entity is just an identifier that specifies which component in this pool they care about. What this allows you to do is batch component updates. If you have an Inventory component on each monster with an inventory, all of your Inventory components can be stored more or less contiguously in memory. This means that when processing them, you have high cache coherency which can be a significant performance boost.

It also might be that you're just trying to do too much each frame. With an entity component system you can throttle specific subsystems to every X frames or every X seconds if you need to. Maybe the AI component only needs to run once a second to think about what to do next, but they need to keep moving continuously so you update position every frame. Or maybe putting together one of the charts and graphs takes too long to complete in one frame so you have it calculated every second frame, or split the processing over two frames so that you iterate over half your entities on frame and the rest on the second frame. There's a lot of ways you can split it up.

Here is some more reading on component systems if you haven't seen them before

edited Jun 02 '16 at 23:21

gsamaras

71,951
46
188
305

answered Jun 01 '16 at 14:51

Alex

14,973
13
59
94

An ECS would not solve the problem. It would just make my logic separate into components, but i would still need to store a lot of information. – vinnylinux Jun 02 '16 at 21:27
1

Correct me if I'm wrong, but your problem appears to be that you can not do the operations you want quickly enough with your database. A database is really not the right tool for the job here. How big is your monster structure? Is it too big to keep all 200k in memory the whole time? Using an ECS allows you to have all your relevant data kept contiguously in memory which helps to minimize cache misses which can be a major source of slowdown. I'm not suggesting you do it for organizational purposes, but as an optimization to allow you to do more per frame. – Alex Jun 02 '16 at 21:33
My data resembles something like this: http://farm3.static.flickr.com/2450/3614560130_aaafa37387.jpg?v=0 – vinnylinux Jun 02 '16 at 23:02
1

Adding an ECS won't solve my issue. I need fast ways to query and traverse my data, not only efficient ways of storing it. ECS would just allow me to store them efficiently, but it would be a nightmare to handle the relationships across components and their data. – vinnylinux Jun 02 '16 at 23:03
Ask yourself: how do i know how many monsters have used option X, were damaged more than X times, use armor of brand X, and are friends with monster A a B? – vinnylinux Jun 02 '16 at 23:04
It's hard to formulate a proper answer without knowing the full game design and knowing all the constraints you're working with. Is this a check that needs to happen every frame? Can you just create a query structure and every time option X runs you see if they're wearing armor X? Does it need to be lockstep or it run in the background while the main loop continues? How much data is there? Does it all fit in memory? How many actions occur every frame that you need to watch for? I feel like this is better as a discussion question on a game dev forum, there isn't one right answer here. – Alex Jun 02 '16 at 23:11

Mike · Accepted Answer · 2017-09-04T09:33:17.473

Based on the former answers/comments and your answers for them, I assume that you decided to continue using SQL but you want a better way for reading/processing it. Therefore, my answer focuses that aspect only, and is related solely to design, rather than other suggestions (like using different SQL engines).

The main issue in your DB, as it reflects in your original question, is that all the aspects of a given monster lay in a single record, all in the same table. That makes the table huge over time. Besides that, this design makes your DB and code hard to maintain and evolve.

While it might sound "right" to hold all the details of a monster in a single record (and, maybe, a single object that reflects it in the code itself), this is rarely a good decision. You should have a clear separation between an object attributes as modeled in your "world" to those attributes modeled in software. For example, the location of a monster is probably being changed very frequently, while its name is not. Since you hold all of the data in a single table, any change to the location is done on the very same table that holds all the other attributes, which makes it a very heavy action.

What you should do, instead, is to:

Separate the data into different tables
Choose good indices to each table
Use joins as necessary (based, for example, on the monster's ID)

That way, any read and change is done only in the relevant context; for example, changing the location of a monster will only change the locations table, without affecting tables of more persistent details. Joins should not take lots of time, as long as you have good index and you filter only those data that interest you. In addition to speeding up your queries, this design is much more flexible. For example, if you want to add a new type of attribute to monsters, you can just add a new table, or use an existing table with similar data (e.g. monster's equipment) to add hold it.

If your queries rely a lot on "geographic" locations, and you still want to handle it using a relational DB, you might consider other types of DBs that have better support in spatial queries.

HTH!

score 3 · Answer 3 · answered May 29 '16 at 07:00

3

As I understood, the main problem for you is find optimized way to store your monsters. For example, you can use some tree data structures to find effectively needed monsters on plane. One of these data structures is BSP (binary search partition) tree. It is brief description https://en.wikipedia.org/wiki/Binary_space_partitioning. Qt's graphic view framework uses this approach. For more information about it you can see at http://doc.qt.io/qt-4.8/graphicsview.html

answered May 29 '16 at 07:00

Alex Aparin

4,393
5
25
51

Maybe i didn't explain myself correctly. The problem is not rendering, but accounting for all the simulation information available at every tick of the game. – vinnylinux May 30 '16 at 01:39

score 3 · Answer 4 · answered Aug 28 '17 at 11:43

There hasn't been an accepted answer, so I'll give your question a shot. But I'll be honest with you ;-)

First: You did not specify too many details, this makes a really well fitting answer difficult, if not impossible. For me, it's not clear how much data you want to process in what time. You mention frames, but is this a frame like in frames per second or is it more like what I'd call "a world tick"? In case you want to run the game at >30fps and update the whole world state 30 times per second: Nope, I suppose you can forget doing that (we did a panic simulation with about 1,000 nodes/persons as part of a CUDA lecture. And while that's way more simple than your simulation, it was barely able to run in real time on a GTX780; so I'll assume a seasoned CUDA developer would most likely hit a limit with 10,000 nodes on that hardware - and you want to have >20x the nodes with more a way more complex AI/simulation than "run away from fire to the next visible exit and trample other people if your panic level is too high + simple 2D wall collision").

But if by frame you mean world simulation tick, then yes, this could be doable.

There are again some details missing from your questions now: Do you plan to develop a dedicated MMO server with >200k monsters and thousand players, or is it a local host single player? Or something in between (networked multiplayer RPG, say max 16 players)?

If you only have a few players (I guess so, since you said 2D; and there isn't a huge difference between one player or four): Don't do all the simulation in full detail at once. For full immersion, it is enough to have a detailed simulation in vicinity of the player(s). Like in pen&paper: As a game master (GM) you usually only have a few key events happening across the world, and you make up the rest as you go/have some rough outline what's happening elsewhere but no exact details. If you're a good GM, that's convincing enough for the players, because who cares if the about the current position of a guard in some throne room 50 miles away?

I have a feeling you want to do a "proper, fully simulated game with full social interaction between the NPCs/monster, because no one else is doing something like that" (correct me if I'm wrong), but there is a good reason no one is doing that: It's quite hard.

Idea: If you think in "zones", you only need to run the simulation in those zones the players are currently active. All other zones are frozen. Once the player(s) switch zones, you can just unfreeze and fast-forward the new zone. You can either do this in full detail, or approximate it. If you don't want loading screens, you can unfreeze and fast-forward zones in the vicinity of the players which they might enter.

On top of that, you should thing about your architecture. For example, you mention you want to know what monster did trink potion X. Of course this is simple to write down in SQL, but you won't get happy with the performance. Or at least I don't think I would, and that's after one basic database lecture and a "let's write a high performance SQL server"-advanced-lecture [full disclosure: I'm bad at writing high performance SQL queries since I don't usually use SQL]. Plus: Who needs full ACID for a game? Okay, for simplicity you could put stuff you don't really need that often into a SQL DB (monster height, weight, flavor texts,...?), or ECS, or what-ever technique you deem best. But everything you want to touch every few seconds could go into memory. I mean, if you store 1kByte per monster, than you're at ~200MByte memory for 200k monsters.

Anyway, back to the question "which monsters did trink potion X?": Why would you want to know that? To apply effects? To check if effects wear of? I'd be using a event queue for that: A monster drinks a potion of strength -> Update inventory, give it bonus STR, compute a timeout and queue a event "bonus STR wear of for ". That's probably even faster than processing 200MByte of memory, since you're only doing "what needs to be done", per tick - and not "checking everything for every possible condition, every tick".

Also, think be careful with your algorithms: You have person X knows person Y relationship, annotated with "public/private"? Depending on what you want to do on that graph you might run into NP-hard problems. For example you might have a bad time if you accidentally try to solve a derivative of the clique problem.

I could add more, but since the question is a bit vague I'll just stop here and hope you might get some good ideas.

Thank you for the time in writing such a detailed answer. I'm going to try to provide more info and context: the game is not 3D, i'm using libtcod for graphics. I'm more focused on getting a nice simulation going and using minimal visual representation of it. The reason why i want to simulate everything is because the player has no enemies, the player is "god" itself. So, in order to manage everything, he needs to see everything. It's like The Sims, on a larger scale and less control. — vinnylinux, Aug 28 '17 at 12:43
In case of a rogue like, I'd even more suggest you'd not try to simulate the whole world with every tick/frame/... — ArchimedesMP, Aug 28 '17 at 16:23
Instead think about zoning and event queue system. My terminal is 316x93 = 29388 chars, so I still doubt the user will really see 200k monsters at once ;-) The AI/sim tasks is still quite hefty... You might take a look at climate simulation. Maybe some things can be managed via a simplified background sim (Elite: Dangerous does that), while the current screen + some distance from there is simulated in detail. I can not judge your knowledge on low level C++ data structures, but I'd build everything necessary up from there. [My warning about running into NP hard problems still holds]. — ArchimedesMP, Aug 28 '17 at 16:31
It doesn't really matter, at least to the current game design, if the player is seeing the monster on screen or not. If a tree falls, and no one is near, does it matter? They have a normal life, being simulated on every tick. The simulation must run at all times and that's what makes the game fun. — vinnylinux, Aug 29 '17 at 14:06
My prototype right now is being written in Kotlin with Neo4J as an in-memory data store. I didn't hit any performance walls so far, but i can see this exploding in the near future. I just want to be prepared when that happens, and learn something new along the way. — vinnylinux, Aug 29 '17 at 14:08

score 2 · Answer 5 · edited May 23 '17 at 10:34

2

I wouldn't query the SQL databse at every frame, but instead cache the monsters that you think are the most likely to be needed in calculations.

How many monsters should I cache?

If there is someone that knows, that is you! However, I think that only by experimenting you are going to find a sweet spot for the size of your cache. You search for implementations of cache and get inspired, like this.

Why use SQL at first place? Consider writing to the disk.

Your image suggests a graph, so why not store what you need as a graph? As you suggested, The Boost Graph Library (BGL) can come in handy! Check this too: https://stackoverflow.com/questions/2751826/which-c-graph-library-should-i-use

edited May 23 '17 at 10:34

Community

1
1

answered Jun 02 '16 at 23:31

gsamaras

71,951
46
188
305

Would boost be efficient enough for a game engine? Throughout the years in game dev, i've been always told to avoid the boost lib entirely. Yet, i can't find any decent, easy-to-use, graph library other than boost::graph. – vinnylinux Jun 06 '16 at 22:54
If you can avoid Boost, do so! However, I had used it for [kd-GeRaF[(http://arxiv.org/pdf/1603.09596.pdf) and it was OK. However I do not know to answer your question directly! The link that suggests more libraries should come in handy though! @vinnylinux – gsamaras Jun 07 '16 at 00:00

score 1 · Answer 6 · answered Aug 31 '17 at 16:54

There are 2 ways of limiting what is happening, and they can be used together.

Scheduling the events.
Limiting the data to the visible screen.

Scheduling the events

Assuming the monsters do not act every frame, the next action of a monster can be scheduled for N frames into the future, by working out just the actions which need to be performed on this frame, then the amount of monsters to be dealt with per-frame, can be reduced.

Limiting the data to the visible screen

sqlite has an R*Tree to support selecting only those items within a geographical region.

The screen can only show those monsters which are visible, this allows only those within the screen area have accurate simulation with other monsters getting a coarser simulation.

As far as I can tell, World of Warcraft has a time equation for a monster's position ....

MyXY = getPosition( me, currentTime );

This would be the natural state of the monster when it is in its stable 'not viewed state'. This means actions for the monster don't need to be simulated at all when they are not visible.

Efficient way of storing and retrieving complex objects in C++?

6 Answers6

Scheduling the events

Limiting the data to the visible screen