10

I am currently exploring Change Data Capture as an option to store temporal databases. It is great because it stores only the deltas and seems like it may solve my problem. When I enabled CDC, a bunch of tables appeared under System Tables.

When querying cdc.dbo_MyTable, I am able to see all the changes that took place on the table. Now, how would I construct a historical view? For instance, if I wanted to see the state of my table as of a particular date, how would I go about doing that? Is that even possible?

It looks I need to take the log and start applying it over my original table but I was wondering if there is a built-in way of doing this. Any suggestions?

Some of the use cases I am looking at:

  • Know the state of the graph at a particular point in time
  • Given two graphs at different times, know the set of links that are different (this can probably be obtained using an EXCEPT clause after constructing the tables)
Legend
  • 113,822
  • 119
  • 272
  • 400

3 Answers3

6

it's possible, but not with a built-in way i'm a afraid. You would have to reconstruct the timeline by hand.

Given that the change-tracking tables offer the tran_end_time, which is the time that the value of the property should be perceived as persisted, you would have to make a query that fetches all the distinct periods of table states, join on the tracked property changes and then pivot (to have a presentation in the same form as the table). Don't forget to union with the table state itself to obtain the values that have not been changed/tracked for completeness.

The final result, simplified, should look like

RN PK PropA   PropB   FromDate          ToDate
1  1  'Ver1'  'Ver1'  2012-01-01 09:00  2012-01-02 08:00
2  1  'Ver1'  'Ver2'  2012-01-02 08:00  2012-01-03 07:00
3  1  'Ver2'  'Ver2'  2012-01-03 07:00  *getdate()*
4  2  'Ver1'  'Ver1'  2012-01-01 05:00  2012-01-02 06:00
5  2  'Ver1'  'Ver2'  2012-01-02 06:00  2012-01-03 01:00
6  2  'Ver2'  'Ver2'  2012-01-03 01:00  *getdate()*

note that the getdate() is valid if the row wasn't deleted in which case it should be substituted with the deletion date

EDIT, for the 2 use cases. The first point is easily addressed it's a matter of constructing the temporal object graph and then filtering:

declare @pointInTime datetime = '20120102 10:00';
select * from Reconstructed_TG where FromDate <= @pointInTime and @pointInTime < ToDate

the second point, can be generated easily with the EXCEPT clause, as you point out. given the above query:

declare @pointInTimeA datetime = '20120102 10:00';
declare @pointInTimeB datetime = '20120103 01:00';
select * from Reconstructed_TG where FromDate <= @pointInTimeA and @pointInTimeA < ToDate
EXCEPT
select * from Reconstructed_TG where FromDate <= @pointInTimeB and @pointInTimeB < ToDate

yet the except clause only presents the rows that have at least one different column value; i don't know if that information is really meaningful to the human eye. Depending on your needs a query that works directly on the cdc data may be more appropriate.

Jaguar
  • 5,929
  • 34
  • 48
  • 1
    +1 Thank you for your time and effort! I will explore this deeper and adapt it. Awarded the bounty! :) – Legend Aug 21 '12 at 23:21
0

You may want to check out Snapshots, which have been built in to SQL Server since 2005.

These will be most useful to you if you only need a few timepoints, but they can help you track all of the tables in a complex database.

These are deltas, so Compared to a full copy of a database, however, snapshots are highly space efficient. A snapshot requires only enough storage for the pages that change during its lifetime. Generally, snapshots are kept for a limited time, so their size is not a major concern.

criticalfix
  • 2,870
  • 1
  • 19
  • 32
  • 4
    Snapshots can be useful if you know *ahead* of time the times at which you want to keep a copy of the data. If you're wondering "what was the state at 2pm yesterday", and you didn't take a snapshot at 2pm yesterday, then they're not going to help. – Damien_The_Unbeliever Aug 15 '12 at 06:41
0

I'm not sure about this, never done anything like that, but maybe you can add a column "changeset" to the table that can keep track of the changes you have on the table, every time there's a transaction get the max(changeset) and save the new cahnges with the next value... Or if you have a timestamp and want to know the status of your table at certain time do querys to filter changes previous to the date you want to check... (Not sure if I should write this is as an answer or a comment... I'm new here)

Anyway, hope it helps...

saul672
  • 737
  • 5
  • 6