3

The question is about SQL on SQL Server 2012.

Background:

There is an entity called Entity.

Entity can have N EntityVersion (1:n).

EntityVersion may be approved or not. If it is approved, an attribute ApprovedBy will be assigned in the whole version.

Requirement:

There is the requirement of listing the latest entity versions of all entities, but only the latest approved version of each entity.

Problem:

I need to find an optimal way of achieveing the whole query for a large-scale solution.

For that reason, I have thought two possible approaches:

  1. IsLast flag column on EntityVersion. Whenever a new version is added, the whole transaction will set true to the newly added version, and the previously last version will be assigned with false.

  2. A trigger that will handle every new entity version insertion and will add the new version to a special table which will have the latest versions only. Whenever a new version is added, the previous is dropped from the whole special table. That is, the SELECT to obtain the whole listing will be very cheap.

Question:

What could be the optimal and valid approach? Do you have any other approach in mind?

Thanks!

Note that "optimal" means that will be scalable and it will work with milions of records.



Update

Since I've found that some users in their answers are requesting how is the schema of Entityand EntityVersion, I'm going to describe them better.

  • Entity has no textual information: just an identifier and relations with other entities.

  • EntityVersion has textual and other information. For example: Title, Description, Author... It's important to mention that EntityVersion does not have any duplicated information from Entity.

The easiest solution would be having something like a SELECT DISTINCT that could both define the distinct column and select other columns in the same query. Sadly, as far as I know, this isn't possible in SQL Server (am I wrong?).

Community
  • 1
  • 1
Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206

1 Answers1

1

I think that IsLast flag column is sufficient, but....

  1. First question you should answer is: what's the ratio of Entities to EntityVersions? This can influence your decision.
  2. Also, I'm not sure on your table structure, but I would Assume that EntityVersion table is a copy of Entity table that stores changed fields (Or just diffs maybe?). If that's a case, why not store the latest version in Entity itself. I think this is a more semantically correct approach.
  3. Usually, if you have correct indexes (Maybe try to incorporate columnstoreIndexes?) you don't have to perform any special work like triggers which can be expensive. Try to do your tests with regular Join with GroupBy (Or you don't even need a join if your EntityVersion table has all of necessary fields). Or probably this approach is going to be a little faster: https://stackoverflow.com/a/438990/1792936
  4. Again, you can only construct an optimal query based on your execution plan and speed tests.

  5. If everything else fails, maybe try to incorporate CQRS pattern or something similar

Community
  • 1
  • 1
RAS
  • 3,375
  • 15
  • 24
  • Please check the update in the bottom of my question in order to understand better what `Entity`and `EntityVersion` are. And about your points: 1) It can be great. Think about something like Wikipedia or StackOverflow itself (1 question can have hundred of versions). About the other points, I'm going to take a look, expect some feedback in a day! Thanks for your points. – Matías Fidemraizer Mar 10 '13 at 21:29
  • In addition, I believe that the 2nd approach could be someting like CQRS pattern. Am I wrong? – Matías Fidemraizer Mar 10 '13 at 21:43
  • I've read about the column store indexes. It has been a very very interesting reading and it's a very powerful tool. Both CQRS pattern + column store index may be a extremely powerful approach! – Matías Fidemraizer Mar 11 '13 at 09:09
  • Finally, I'm going to use the approach 2): `Entity` now has a `1:n` relation with `EntityVersion` called `LastApprovedVersionId`. This way, in the object-oriented model, whenever a version for some entity is approved, I set the whole `LastApprovedVersion` association. This is the way! Thanks for your suggestions and as I said in the previous comment, even if I'm not going to use column store this time, it has been a good discovery! – Matías Fidemraizer Mar 11 '13 at 21:08
  • @MatíasFidemraizer, I hope that works for you. Do not forget, if you still not satisfied with a performance after all query and database optimizations, perhaps hardware upgrade can help you. SSDs can make a significant difference. – RAS Mar 12 '13 at 17:08
  • Thanks. It's too early to talk about database optimization. I'm just in the 20% of the development. I was looking to go in the right direction! – Matías Fidemraizer Mar 12 '13 at 19:36