We have a requirement to create a table that maintains temporal validity (i.e. for a given key, in this case Md5 in the table below, there will be no overlapping periods). Users need the ability to set the EffectiveFrom
and EffectiveTo
dates therefore the temporal tables are not useful as they only appear to allow system generated dates. The use case is that bulk data will be uploaded with the validity date range set and this needs to be applied to the existing data to ensure that there are no period overlaps.
Table definitions:
IF OBJECT_ID('dbo.IngestedData', 'U') IS NOT NULL
DROP TABLE IngestedData;
CREATE TABLE IngestedData
(
ID INT IDENTITY(1,1),
Md5 VARCHAR(15) NOT NULL,
EffectiveFrom DATE NOT NULL,
EffectiveTo DATE NOT NULL,
UpdateUser VARCHAR(50),
JsonData VARCHAR(MAX),
CONSTRAINT CK_IngestedData_Start_End CHECK (EffectiveFrom < EffectiveTo),
CONSTRAINT UK_IngestedData_Md5_Start_End UNIQUE(Md5, EffectiveFrom),
PRIMARY KEY (Id)
);
CREATE NONCLUSTERED INDEX AK_IngestedData_Md5
ON IngestedData (Md5);
CREATE NONCLUSTERED INDEX AK_IngestedData_EffectiveFrom
ON IngestedData (EffectiveFrom);
CREATE NONCLUSTERED INDEX AK_IngestedData_EffectiveTo
ON IngestedData (EffectiveTo);
I have written an upsert procedure that works well for single row updates as follows:
Upsert procedure:
CREATE PROCEDURE dbo.usp_UpsertIngestedDataRow
@Md5 VARCHAR(20),
@EffectiveFrom DateTime,
@EffectiveTo DateTime,
@UpdateUser VARCHAR(50),
@JsonData VARCHAR(MAX)
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY;
BEGIN TRANSACTION;
--Select the data that needs to be modified along with the action to be taken
WITH NewRow(ID, Md5, EffectiveFrom, EffectiveTo, UpdateUser, JsonData, [Action]) AS
(
SELECT NULL, @Md5, @EffectiveFrom, @EffectiveTo, @UpdateUser, @JsonData, 'I'
),
OverlappingRows(ID, Md5, EffectiveFrom, EffectiveTo, UpdateUser, JsonData) AS
(
SELECT
X.ID, X.Md5, X.EffectiveFrom, X.EffectiveTo, X.UpdateUser, X.JsonData
FROM
NewRow A
JOIN
IngestedData X ON (X.EffectiveFrom < A.EffectiveTo
AND X.EffectiveTo > A.EffectiveFrom)
AND A.Md5 = X.Md5
),
NewStartRows(ID, Md5, EffectiveFrom, EffectiveTo, UpdateUser, JsonData, [Action]) AS
(
SELECT
s.ID, s.Md5, s.EffectiveFrom,
(SELECT DATEADD(DAY, -1, MIN(EffectiveFrom))
FROM NewRow),
s.UpdateUser, s.JsonData, 'I'
FROM
OverlappingRows s
WHERE
EffectiveFrom < (SELECT MIN(EffectiveFrom) FROM NewRow)
),
NewEndRows(ID, Md5, EffectiveFrom, EffectiveTo, UpdateUser, JsonData, [Action]) AS
(
SELECT
s.ID, s.Md5,
(SELECT DATEADD(DAY, 1, MIN(EffectiveTo))
FROM NewRow),
s.EffectiveTo, s.UpdateUser, s.JsonData, 'I'
FROM
OverlappingRows s
WHERE
EffectiveTo > (SELECT MAX(EffectiveTo) FROM NewRow)
),
DeleteRows(ID, Md5, EffectiveFrom, EffectiveTo, UpdateUser, JsonData, [Action]) AS
(
SELECT
del.ID, del.Md5, del.EffectiveFrom, del.EffectiveTo,
del.UpdateUser, del.JsonData, 'D'
FROM
OverlappingRows del
INNER JOIN
NewRow n ON n.EffectiveFrom <= del.EffectiveFrom
AND n.EffectiveTo >= del.EffectiveTo
)
SELECT *
INTO #Temp
FROM
(SELECT * FROM NewRow
UNION
SELECT * FROM NewStartRows
UNION
SELECT * FROM NewEndRows
UNION
SELECT * FROM DeleteRows) AS Data;
--Delete any rows that are being replaced
DELETE FROM IngestedData WHERE ID IN (SELECT DISTINCT ID FROM #Temp)
--Insert the replacement
INSERT INTO IngestedData(Md5, EffectiveFrom, EffectiveTo, UpdateUser, JsonData)
SELECT Md5, EffectiveFrom, EffectiveTo, UpdateUser, JsonData
FROM #Temp
WHERE [Action] = 'I'
--Drop temp table
IF OBJECT_ID('tempdb.dbo.#Temp', 'U') IS NOT NULL
DROP TABLE #Temp
COMMIT;
END TRY
BEGIN CATCH
ROLLBACK;
THROW;
END CATCH
END
GO
The individual call performance is good with a call taking around 7ms even when the table is populated with 10,000,000 records. The issue is making large number of updates. Executing the above stored procedure for 35,000 records via a cursor takes around 5 minutes.
I tried rewriting the procedure to take a table variable which would allow the DML to use set operations but got lost in the logic. Can anyone help to turn the above logic in to a set based update that follows this pattern:
New stored procedure:
CREATE PROCEDURE usp_BulkUpsertIngestedData
@UpdateUser VARCHAR(15),
@NewRows DataIngestionRecord READONLY
AS
BEGIN
Type definition
CREATE TYPE DataIngestionRecord AS TABLE
(
Md5 VARCHAR(15) NOT NULL,
EffectiveFrom DATE NOT NULL,
EffectiveTo DATE NOT NULL,
JsonData VARCHAR(MAX)
)