I am having some problems designing a schema for dynamic properties with versioning. Assume the following use case:
I have a table called Actor
which contains the id
and a name
(to keep it simple). The upper bound for my case would be, that this table contains about 1 million entries.
Further every actor gets properties assigned to them. Because I don't know the properties at that time, I need a table to manage properties. I thought about a Property
-table. The resulting n:m relation will be resolved with a table between Actor
and Property
containing their primary keys and the property value (type?).
At the moment this seems pretty easy to handle. With one million entries each having 10 properties, the ActorProperty
table would have ten million nodes. I believe with btree
index (log2(n)) this should be no problem.
Now comes the part I am struggling with. The properties should somehow be tracked. Over time these properties change, but the history should not get lost. Most likely it will be done with a timestamp. Note that multiple properties get updated at the same time. An example would be: I take a snapshot of all actors every day, and if something changes, I'll update all changed properties at the same time. This results in 365 timestamps per year.
If I would use another table to manage the versions (timestamps) and add another foreign key to the ActorProperty
table I would get 365 * 10 million entries. This should be the maximum I would ever get. Most of the time the datasets will be significantly smaller.
My question now is more addressing the performance. I read the following answer about indexes: How does database indexing work. Is it not awfully slow to query a table with that amount of entries? An example query would be: First 100 actors with all their properties at a given timestamp id=x. Also I feel like the schema I'm thinking of is maybe not the best. Does anyone have any suggestions or ideas for a schema with more scalability?
By the way, I am currently also evaluating NoSql approaches, so I'd like to concentrate on relational approaches for the time being. My aim is to gather advantages and disadvantages of the different technologies and then have a theoretical schema or model for the described use case. And performance with an optimal model on a relational database is something I seemingly have trouble to evaluate or find.
Thanks!