How to uniquely identify a record in an INSTEAD OF INSERT trigger?

Question

OK, here's a bit of a challenge. I might be missing the obvious, but so far I've been racking my brain but not been able to come up with a decent solution. (I have some 'work-arounds', but I like neither)

We are extending an existing table t_table_x with more fields. The table has an identity column that acts as PK and lots of data-columns that (functionally) can contain 'doubles'. The problem is, the table's fields already take up a large part of the available 8000 bytes of a PAGE, adding all the new fields would cause (some) records to go over this limitation. The solution seemed simple. We simple add a new table t_table_y which shares the same identity value and then FK from y to x. For usability we then add a view that JOINs both tables and returns things as if they sit in 1 large table. So far so good. Thinking about usability again it would be great if the user could also load the information (ETL) directly into the view, thus not having to come up with first inserting the first half into t_table_x and then the other half in t_table_y. At first I was sceptical because this meant that the inserted (pseudo) table would need to be able to support more than 8000 bytes per record. Turn out, this works flawlessly! But then the trouble began. When the trigger inserts the relevant columns in t_table_x, this generates the IDENTITY values we will need to use to insert the other columns in t_table_y. I have however run into the problem that I have no way to know what identity-value fits what original record in [inserted].

I could try How to write an INSTEAD OF INSERT trigger on a multi table view that works with identities? but the problem here is that #inserted won't work because of the 8k limit. A (bad-dish) work-around would be to make all fields varchar(max) so the data goes out-of-page. There probably will be some performance hit, but oh well...

Another alternative I came up with was to use %%physloc%% but it seems this doesn't work on the pseudo table inserted.

What surely will work is going RBAR by using a cursor, but... well.. rather not =)

Alternatively I could create #table_x and #table_y temp-tables, insert into both tables and then start matching them up via [inserted] again. Because there could be doubles all over the place (both in x and y) this could be a rather heavy operation (lots of data is going to be involved; I cant add any indexes to the pseudo table; etc... Also, the example here is simplified, I'm actually adding 4 extra tables using the explained logic).

Anyone who has a more elegant solution ?

IF OBJECT_ID('v_test') IS NOT NULL DROP VIEW v_test
IF OBJECT_ID('t_table_y') IS NOT NULL DROP TABLE t_table_y
IF OBJECT_ID('t_table_x') IS NOT NULL DROP TABLE t_table_x

GO

CREATE TABLE t_table_x ( row_id  int           NOT NULL IDENTITY(1, 1) 
                                PRIMARY KEY,
                         value_a varchar(3000) NOT NULL,
                         value_b varchar(3000) NOT NULL )


CREATE TABLE t_table_y ( row_id  int           NOT NULL 
                                PRIMARY KEY,
                                FOREIGN KEY (row_id) REFERENCES t_table_x (row_id),
                         value_c varchar(3000) NOT NULL,
                         value_d varchar(3000) NOT NULL )

GO

CREATE VIEW v_test
AS 
SELECT x.row_id,
       x.value_a,
       x.value_b,
       y.value_c,
       y.value_d
  FROM t_table_x x
  JOIN t_table_y y
    ON y.row_id = x.row_id

GO

DECLARE @row_id int

INSERT t_table_x (value_a, value_b) VALUES (Replicate('A', 2500), Replicate('B', 2500))
SELECT @row_id = SCOPE_IDENTITY()

INSERT t_table_y (row_id, value_c, value_d) VALUES (@row_id, Replicate('C', 2500), Replicate('D', 2500))
GO
SELECT * FROM v_test

GO
-- this won't work
INSERT v_test (value_a, value_b, value_c, value_d)
SELECT t.name, Convert(varchar, t.object_id), c.name, Convert(varchar, c.column_id)
  FROM sys.tables t
  JOIN sys.columns c
    ON c.object_id = t.object_id

GO
-- so we build an INSTEAD OF INSERT trigger
CREATE TRIGGER tr1_v_test
ON v_test
INSTEAD OF INSERT
AS
    -- simply return results for now
    SELECT * FROM t_entity

GO
-- test
INSERT v_test (value_a, value_b, value_c, value_d)
SELECT t.name, Convert(varchar, t.object_id), c.name, Convert(varchar, c.column_id)
  FROM sys.tables t
  JOIN sys.columns c
    ON c.object_id = t.object_id

-- as we can see, it works now, but (logically) we don't have a value in row_id (yet) =/
GO
ALTER TRIGGER tr1_v_test
ON v_test
INSTEAD OF INSERT
AS
    -- how to find/add a row-identifier to [inserted]

    -- not allowed:
    UPDATE [inserted] SET row_id ...

    -- not available
    SELECT *, %%physloc%% FROM inserted

    -- not an option (max size of a record = 8000 bytes (PAGE))
    SELECT row_id = IDENTITY(int, 1, 1), value_a, value_b, value_c, value_d 
      INTO #numbered_temp_table
      FROM [inserted]

    -- ???

update: while typing this I kept searching around and also found this: TSQL is expecting the identity column to be inserted when using an instead of insert trigger which comes down to: let the ETL figure out a unique row_id value inside the insert. Since we have ROW_NUMBER() available, this shouldn't be that much to ask imho. Unless someone comes up with better solution I might in fact go for this one.

Giorgi Nakeuri · Accepted Answer · 2015-11-05T16:17:01.467

1

You can use MERGE statement for this. One good thing about it is that it can output some additional columns from source along with inserted values. Here is an example:

CREATE TABLE table1(ID int IDENTITY, col1 INT)
GO

CREATE TABLE table2(ID int, col2 INT)
GO

CREATE TABLE table3(ID int, col3 INT)
GO

ALTER VIEW vtable
AS

SELECT t1.ID, t1.col1, t2.col2, t3.col3 
FROM table1 t1
JOIN table2 t2 ON t2.ID = t1.ID
JOIN table3 t3 ON t3.ID = t1.ID
GO


CREATE TRIGGER trvtable ON dbo.vtable
INSTEAD OF INSERT
AS
BEGIN
   DECLARE @t TABLE(ID int, col2 INT, col3 INT)

    MERGE dbo.table1 t
    USING Inserted s ON t.ID = s.ID
    WHEN NOT MATCHED THEN INSERT(col1) VALUES(s.col1)
    OUTPUT inserted.ID, s.col2, s.col3 INTO @t;

    INSERT INTO dbo.table2 SELECT ID, col2 FROM @t
    INSERT INTO dbo.table3 SELECT ID, col3 FROM @t

END
GO


INSERT INTO dbo.vtable( ID, col1, col2, col3 ) VALUES  
(NULL, 1, 2, 3),
(NULL, 4, 5, 6)

SELECT * FROM dbo.vtable

Output:

ID  col1    col2    col3
1   1       2       3
2   4       5       6

Note how I am inserting NULL VALUES ( NULL, -- ID - int. That is important.

edited Nov 05 '15 at 16:17

answered Nov 05 '15 at 15:26

Giorgi Nakeuri

35,155
8
47
75

That looks clever indeed! But will it scale to 5 tables ? – deroby Nov 05 '15 at 15:52
@deroby, it scales very well. But instead of outputting to table2 directly, now I output to temp table and then distribute columns in remaining tables. – Giorgi Nakeuri Nov 05 '15 at 16:18
But how do you make the temp-table hold value_c, value_d, value_e, value_f, value_g & value_h? They all are 2500 bytes each. Your example uses `int`s for the fields, I use varchar(3000) and they are mostly filled in. Don't get me wrong, I really like the idea. It would work well for two tables, but I don't see it working for my (extended) use-case. Damn-me for simplifying the example =) – deroby Nov 05 '15 at 16:22
@deroby, Ok I see now that the problem is really a problem. Let me think again. – Giorgi Nakeuri Nov 05 '15 at 16:26
@deroby, stop man :) just `DECLARE @t TABLE(ID int, col2 INT, col3 INT)` declare `varchar(max)` here! Only in table variable! – Giorgi Nakeuri Nov 05 '15 at 16:28
Yes, that is indeed one of my ideas from the beginning, I'm just wondering if there might be something simpler I might have missed.. but when used like you do it doesn't look too bad at all. – deroby Nov 05 '15 at 16:38
@deroby, your idea was to change base tables. Here you use varchar(max) only in table variable that will be dropped as soon as trigger ends its job. Umm, I can not see any other solution from top of my head. – Giorgi Nakeuri Nov 05 '15 at 16:40
Not quite, I intended to do it for the #insert table as used in the linked SO question. But I hadn't thought about the `OUTPUT` clause which makes it a lot more interesting. – deroby Nov 05 '15 at 16:47
It works perfectly! I Changed the `USING Inserted s ON t.ID = s.ID` to `USING Inserted s ON 1 = 2` to avoid anyone trying to input an already existing value via `ID` by accident and also made sure all (n)varchar fields of the temp-table are defined as `max`. Thank you very much! – deroby Nov 09 '15 at 15:40

How to uniquely identify a record in an INSTEAD OF INSERT trigger?

1 Answers1