42

I have 2 tables, Table-A and Table-A-History.

  • Table-A contains current data rows.
  • Table-A-History contains historical data

I would like to have the most current row of my data in Table-A, and Table-A-History containing historical rows.

I can think of 2 ways to accomplish this:

  1. whenever a new data row is available, move the current row from Table-A to Table-A-History and update the Table-A row with the latest data (via insert into select or select into table)

    or

  2. whenever a new data row is available, update Table-A's row and insert a new row into Table-A-History.

In regards to performance is method 1 or 2 better? Is there a better different way to accomplish this?

Community
  • 1
  • 1
Mausimo
  • 8,018
  • 12
  • 52
  • 70
  • 2
    Have you considered using triggers on `Table-A` to create the `Table-A-History` rows for you? Make sure that they are set to fire last ([sp_settriggerorder](http://msdn.microsoft.com/en-us/library/ms186762.aspx)). – HABO Aug 09 '12 at 19:59
  • No I have not. I will look into triggers. Thanks. – Mausimo Aug 09 '12 at 20:01
  • Probably related with https://stackoverflow.com/a/70596143/717267 – Eduardo Cuomo Jan 05 '22 at 17:04

8 Answers8

67

Basically you are looking to track/audit changes to a table while keeping the primary table small in size.

There are several ways to solve this issue. The cons and pros of each way is discussed below.

1 - Auditing of the table with triggers.

If you are looking to audit the table (inserts, updates, deletes), look at my how to revent unwanted transactions - SQL Saturday slide deck w/code - http://craftydba.com/?page_id=880. The trigger that fills the audit table can hold information from multiple tables, if you choose, since the data is saved as XML. Therefore, you can un-delete an action if necessary by parsing the XML. It tracks who and what made the change.

Optionally, you can have the audit table on it's own file group.

Description:
    Table Triggers For (Insert, Update, Delete)
    Active table has current records.
    Audit (history) table for non-active records.

Pros:
    Active table has smaller # of records.
    Index in active table is small.
    Change is quickly reported in audit table.
    Tells you what change was made (ins, del, upd)

Cons:
    Have to join two tables to do historical reporting.
    Does not track schema changes.

2 - Effective dating the records

If you are never going to purge the data from the audit table, why not mark the row as deleted but keep it for ever? Many systems like people soft use effective dating to show if a record is no longer active. In the BI world this is called a type 2 dimensional table (slowly changing dimensions). See the data warehouse institute article. http://www.bidw.org/datawarehousing/scd-type-2/ Each record has a begin and end date.

All active records have a end date of null.

Description:
    Table Triggers For (Insert, Update, Delete)
    Main table has both active and historical records.

Pros:
    Historical reporting is easy.
    Change is quickly shown in main table.

Cons:
    Main table has a large # of records.
    Index of main table is large.
    Both active & history records in same filegroup.
    Does not tell you what change was made (ins, del, upd)
    Does not track schema changes.

3 - Change Data Capture (Enterprise Feature).

Micorsoft SQL Server 2008 introduced the change data capture feature. While this tracks data change (CDC) using a LOG reader after the fact, it lacks things like who and what made the change. MSDN Details - http://technet.microsoft.com/en-us/library/bb522489(v=sql.105).aspx

This solution is dependent upon the CDC jobs running. Any issues with sql agent will cause delays in data showing up.

See change data capture tables. http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx

Description:
    Enable change data capture

Pros:
    Do not need to add triggers or tables to capture data.
    Tells you what change was made (ins, del, upd) the _$operation field in 
    <user_defined_table_CT>
    Tracks schema changes.    

Cons:
    Only available in enterprise version.
    Since it reads the log after the fact, time delay in data showing up.
    The CDC tables do not track who or what made the change.
    Disabling CDC removes the tables (not nice)!
    Need to decode and use the _$update_mask to figure out what columns changed.

4 - Change Tracking Feature (All Versions).

Micorsoft SQL Server 2008 introduced the change tracking feature. Unlike CDC, it comes with all versions; However, it comes with a bunch of TSQL functions that you have to call to figure out what happened.

It was designed for the purpose of synchronization one data source with SQL server via an application. There is a whole synchronization frame work on TechNet.

http://msdn.microsoft.com/en-us/library/bb933874.aspx http://msdn.microsoft.com/en-us/library/bb933994.aspx http://technet.microsoft.com/en-us/library/bb934145(v=sql.105).aspx

Unlike CDC, you specify how long changes last in the database before being purged. Also, inserts and deletes do not record data. Updates only record what field changed.

Since you are synchronizing the SQL server source to another target, this works fine. It is not good for auditing unless you write a periodic job to figure out changes.

You will still have to store that information somewhere.

Description:
    Enable change tracking

Cons:
    Not a good auditing solution

The first three solutions will work for your auditing. I like the first solution since I use it extensively in my environment.

Sincerely

John

Code Snippet From Presentation (Autos Database)

-- 
-- 7 - Auditing data changes (table for DML trigger)
-- 


-- Delete existing table
IF OBJECT_ID('[AUDIT].[LOG_TABLE_CHANGES]') IS NOT NULL 
  DROP TABLE [AUDIT].[LOG_TABLE_CHANGES]
GO


-- Add the table
CREATE TABLE [AUDIT].[LOG_TABLE_CHANGES]
(
  [CHG_ID] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
  [CHG_DATE] [datetime] NOT NULL,
  [CHG_TYPE] [varchar](20) NOT NULL,
  [CHG_BY] [nvarchar](256) NOT NULL,
  [APP_NAME] [nvarchar](128) NOT NULL,
  [HOST_NAME] [nvarchar](128) NOT NULL,
  [SCHEMA_NAME] [sysname] NOT NULL,
  [OBJECT_NAME] [sysname] NOT NULL,
  [XML_RECSET] [xml] NULL,
 CONSTRAINT [PK_LTC_CHG_ID] PRIMARY KEY CLUSTERED ([CHG_ID] ASC)
) ON [PRIMARY]
GO

-- Add defaults for key information
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_CHG_DATE] DEFAULT (getdate()) FOR [CHG_DATE];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_CHG_TYPE] DEFAULT ('') FOR [CHG_TYPE];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_CHG_BY] DEFAULT (coalesce(suser_sname(),'?')) FOR [CHG_BY];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_APP_NAME] DEFAULT (coalesce(app_name(),'?')) FOR [APP_NAME];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_HOST_NAME] DEFAULT (coalesce(host_name(),'?')) FOR [HOST_NAME];
GO



--
--  8 - Make DML trigger to capture changes
--


-- Delete existing trigger
IF OBJECT_ID('[ACTIVE].[TRG_FLUID_DATA]') IS NOT NULL 
  DROP TRIGGER [ACTIVE].[TRG_FLUID_DATA]
GO

-- Add trigger to log all changes
CREATE TRIGGER [ACTIVE].[TRG_FLUID_DATA] ON [ACTIVE].[CARS_BY_COUNTRY]
  FOR INSERT, UPDATE, DELETE AS
BEGIN

  -- Detect inserts
  IF EXISTS (select * from inserted) AND NOT EXISTS (select * from deleted)
  BEGIN
    INSERT [AUDIT].[LOG_TABLE_CHANGES] ([CHG_TYPE], [SCHEMA_NAME], [OBJECT_NAME], [XML_RECSET])
    SELECT 'INSERT', '[ACTIVE]', '[CARS_BY_COUNTRY]', (SELECT * FROM inserted as Record for xml auto, elements , root('RecordSet'), type)
    RETURN;
  END

  -- Detect deletes
  IF EXISTS (select * from deleted) AND NOT EXISTS (select * from inserted)
  BEGIN
    INSERT [AUDIT].[LOG_TABLE_CHANGES] ([CHG_TYPE], [SCHEMA_NAME], [OBJECT_NAME], [XML_RECSET])
    SELECT 'DELETE', '[ACTIVE]', '[CARS_BY_COUNTRY]', (SELECT * FROM deleted as Record for xml auto, elements , root('RecordSet'), type)
    RETURN;
  END

  -- Update inserts
  IF EXISTS (select * from inserted) AND EXISTS (select * from deleted)
  BEGIN
    INSERT [AUDIT].[LOG_TABLE_CHANGES] ([CHG_TYPE], [SCHEMA_NAME], [OBJECT_NAME], [XML_RECSET])
    SELECT 'UPDATE', '[ACTIVE]', '[CARS_BY_COUNTRY]', (SELECT * FROM deleted as Record for xml auto, elements , root('RecordSet'), type)
    RETURN;
  END

END;
GO



--
--  9 - Test DML trigger by updating, deleting and inserting data
--

-- Execute an update
UPDATE [ACTIVE].[CARS_BY_COUNTRY]
SET COUNTRY_NAME = 'Czech Republic'
WHERE COUNTRY_ID = 8
GO

-- Remove all data
DELETE FROM [ACTIVE].[CARS_BY_COUNTRY];
GO

-- Execute the load
EXECUTE [ACTIVE].[USP_LOAD_CARS_BY_COUNTRY];
GO 

-- Show the audit trail
SELECT * FROM [AUDIT].[LOG_TABLE_CHANGES]
GO

-- Disable the trigger
ALTER TABLE [ACTIVE].[CARS_BY_COUNTRY] DISABLE TRIGGER [TRG_FLUID_DATA];

** Look & Feel of audit table **

enter image description here

CRAFTY DBA
  • 14,351
  • 4
  • 26
  • 30
  • Great read, thanks for the insight! I just want to verify that I understood your code snippet. You only have 1 "Log Table Changes" table, that will store records from every other table and the actual records from those tables are stored in XML? That way you only ever have one audit table? – Mausimo Oct 02 '13 at 19:51
  • The cool part is that is it up 2 you. Lets say your company does taxes, your data retention period is 7 years. You might want to use several audit tables partition on chg_date. See my presentation on data warehousing techniques. On the other hand, if you sell ice cream for a business, you might keep receipts for 2 years. Then one table might be fine. – CRAFTY DBA Oct 02 '13 at 20:31
  • 1
    Shouldn't the "Cons" for triggers include increased insert/update/delete time? I feel like part of the right decision here is balancing write speed vs query speed. – Daniel Sep 29 '17 at 15:01
  • Under "effective dating the records" a _con_ is "index of main table is large." By using a [filtered index](https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes?view=sql-server-2017) the size of the index can be managed for the active rows. (This may require adding a column, e.g. `IsActive`, to use in the filter predicate.) Historical queries can use an index that cover all rows. The index storage is increased, but query performance can be maintained. – HABO Sep 10 '18 at 21:21
33

The recent versions of SQL server (2016+ and Azure) have temporal tables which provide the exact functionality requested, as a first class feature. https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables

Somebody at Microsoft probably read this page. :)

NightKnight
  • 331
  • 3
  • 2
  • 1
    Thanks for adding to this. This question was actually from quite some time ago. Coincidentally, I am working on a new project that has a similar requirement and I am actually using Azure. I will be looking into Temporal-Tables. Cheers! – Mausimo May 02 '17 at 14:53
  • A big drawback with the new temporal tables and the reason I won't switch to them is they don't include the user who triggered the change in the tables, there are work-arounds but none of them straight-forward. – bendataclear Aug 12 '18 at 09:57
  • 1
    @bendataclear we solve this by adding a ChangeUser column to the original table.FWIW. – Billy Oct 26 '18 at 17:45
  • @Billy But the temporal tables hold the deleted row so this will show the previous edit user with the current edit time. – bendataclear Oct 27 '18 at 18:51
  • @bendataclear That is true, but the current record in the main table will show the last user to change the record. In order to determine the user that made the change you have to look at the next record in the series. – Billy Nov 13 '18 at 18:46
  • Will this solution suit my case? ==> My scenario :- for performance reasons need to have only the current running data in the table For this reason remaining Data (history data) needs to be moved to History table. – jAntoni May 24 '20 at 18:29
28

Logging changes is something I've generally done using triggers on a base table to record changes in a log table. The log table has additional columns to record the database user, action and date/time.

create trigger Table-A_LogDelete on dbo.Table-A
  for delete
as
  declare @Now as DateTime = GetDate()
  set nocount on
  insert into Table-A-History
    select SUser_SName(), 'delete-deleted', @Now, *
      from deleted
go
exec sp_settriggerorder @triggername = 'Table-A_LogDelete', @order = 'last', @stmttype = 'delete'
go
create trigger Table-A_LogInsert on dbo.Table-A
  for insert
as
  declare @Now as DateTime = GetDate()
  set nocount on
  insert into Table-A-History
    select SUser_SName(), 'insert-inserted', @Now, *
      from inserted
go
exec sp_settriggerorder @triggername = 'Table-A_LogInsert', @order = 'last', @stmttype = 'insert'
go
create trigger Table-A_LogUpdate on dbo.Table-A
  for update
as
  declare @Now as DateTime = GetDate()
  set nocount on
  insert into Table-A-History
    select SUser_SName(), 'update-deleted', @Now, *
      from deleted
  insert into Table-A-History
    select SUser_SName(), 'update-inserted', @Now, *
      from inserted
go
exec sp_settriggerorder @triggername = 'Table-A_LogUpdate', @order = 'last', @stmttype = 'update'

Logging triggers should always be set to fire last. Otherwise, a subsequent trigger may rollback the original transaction, but the log table will have already been updated. This is a confusing state of affairs.

HABO
  • 15,314
  • 5
  • 39
  • 57
  • Is it possible to do this and record a user from elsewhere? Say someone authenticated to an intranet site? – Patrick Schomburg Aug 24 '18 at 18:58
  • 1
    @PatrickSchomburg The triggers only have reliable access to the user identity as SQL Server sees it. A website that uses a single SQL Server account to access the database for all web users presents a problem. A workaround is to use [`Session_Context`](https://docs.microsoft.com/en-us/sql/t-sql/functions/session-context-transact-sql?view=sql-server-2017) to store the website user's id and then pick up the value in the trigger for logging. That requires everyone to play by the same rules: _always_ set the user's id before doing anything else or bad things may happen, e.g. a stale id is logged. – HABO Aug 25 '18 at 02:30
  • That worked great. Logging a user is always problematic in systems where multiple users share a login. I mostly work on a system where the users are logged in to an ActiveUsers table and that works OK. The only change I made is recording the DELETE action in the Update trigger is redundant since that's already recorded from the insertion or previous change, so I removed that. Thanks. – Wade Hatler Sep 10 '18 at 20:33
  • 1
    A bit late, but another option is just to store a LastModifiedByUserID field in the base table, and then have your main update statement in the website store the userID to the the base table. Then, in your logging trigger, you can add i.LastModifiedByUserID to the insert statement. – Katerine459 May 16 '19 at 18:39
4

How about method 3: Make Table-A a view against Table-A-History. Insert into Table-A-History and let appropriate filtering logic generate Table-A. That way you're only inserting into one table.

mwigdahl
  • 16,268
  • 7
  • 50
  • 64
  • 1
    I thought I should separate the tables, as Table-A will hold ~10K records that are being used frequently. Where as the History table will get huge and be used a lot less. 5-10% of the time compared to table-A. Performance wise, would it not be better to have the database searching through 10K records frequently, rather than a huge table if I combined Table-A & Table-A-History – Mausimo Aug 09 '12 at 19:59
  • Possibly, possibly not, depending on the ratio of inserts to selects. You could also make Table-A an indexed view (http://msdn.microsoft.com/en-us/library/dd171921%28v=sql.100%29.aspx). This might resolve the search issue for you outright. – mwigdahl Aug 09 '12 at 21:26
4

Even though it consumes more space, having the history table containing the most recent record as well will save you pain on writing reports and seeing how changes occurred and when. Something worth thinking about in my opinion.

As far as performance, I would expect them to be identical. But, you certainly wouldn't want to delete the record (option 1's "move") from the non-hist table because you are using referential integrity between the two tables, right?

UnhandledExcepSean
  • 12,504
  • 2
  • 35
  • 51
  • Right, I would update the record in table A. The Table-A-History table uses a surrogate key + a foreign key linking to Table-A – Mausimo Aug 09 '12 at 20:04
2

I would prefer method 1
In addition, I will have also maintain the current record in the history table too
it depends on the need.

Sankara
  • 1,469
  • 15
  • 22
2

Option 1 is OK. But you have method 4 too :)

  1. Insert new record to your table,

  2. Move old record to archive table on regular base using mysql scheduler. You can schedule data archivation at the time of minimal load, for example at night hours.

Vahan
  • 169
  • 3
  • 12
  • 1
    Opps, sorry. But the idea is the same. In the case you don't want to loose in performace during day hours, just do it at night ;-) – Vahan Aug 09 '12 at 20:41
  • I think the question is not only about INSERTs, but also UPDATEs and DELETEs. What if an inserted row is updated or deleted on the same day? In that case, it is not possible to track changes which are done on the same day. (If the question is about only INSERTs, than an audit table is not needed, because no data will be changed using only INSERTs.) – MÇT Oct 28 '16 at 08:47
1

You can simply create procedure or job to overcome this issue like this:

 create procedure [dbo].[sp_LoadNewData]
 AS
INSERT INTO [dbo].[Table-A-History]
 (
 [1.Column Name], [2.Column Name], [3.Column Name], [4.Column Name]
 )    
 SELECT [1.Column Name], [2.Column Name], [3.Column Name], [4.Column Name]
 FROM dbo.[Table-A] S

 WHERE NOT EXISTS
 (
 SELECT  * FROM [dbo].[Table-A-History] D WHERE D.[1.Column Name] =S.[1.Column Name]
 )

Note: [1.Column Name] is common column for the tables.