1

I believe SQL Server indexed views are validating inserts in the base table incorrectly.

In order to simulate, consider the following

Create table:

CREATE TABLE [dbo].[table_e]
(
    [id] [int] NOT NULL,
    [module] [varchar](50) NULL,
    [event] [varchar](50) NULL,
    [params] [nvarchar](max) NULL,

    CONSTRAINT [PK_table_e] 
        PRIMARY KEY CLUSTERED ([id] ASC)
                    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                          IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,  
                          ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

Populate table

INSERT INTO [dbo].[table_e] ([id], [module], [event], [params])
VALUES (1, 'ModuleB', 'EventT', N'[{"type":"AccountId","value":"AccountX"},{"type":"AccountId","value":"AccountZ"},{"type":"Balance","value":10},{"type":"Balance","value":10}]'),
(2, 'ModuleB', 'EventT', N'[{"type":"AccountId","value":"AccountY"},{"type":"AccountId","value":"AccountX"},{"type":"Balance","value":20}]'),
(3, 'ModuleP', 'EventA', N'[{"type":"AccountId","value":"AccountZ"},{"type":"AccountId","value":"AccountY"},{"type":"Hash","value":"SomeHash"}]')

Check to see if all is well

SELECT * 
FROM [dbo].[table_e] -- returning 3 out of 3

Create the schemabound view

CREATE VIEW [dbo].[iv_test]
WITH SCHEMABINDING
AS
    SELECT 
        e.[id],
        CAST(JSON_VALUE(e.[params], '$[0].value') AS CHAR(66)) AS [AccountAddress_From],
        CAST(JSON_VALUE(e.[params], '$[1].value') AS CHAR(66)) AS [AccountAddress_To],
        CAST(JSON_VALUE(e.[params], '$[2].value') AS DECIMAL (36)) AS [Amount_Transferred],
        CAST(JSON_VALUE(e.[params], '$[3].value') AS DECIMAL (36)) AS [Amount_Fees]
    FROM 
        [dbo].[table_e] e 
    WHERE 
        e.[module] = 'ModuleB' AND e.[event] = 'EventT'
GO

Check to see if all is well

SELECT * 
FROM [dbo].[iv_test]     -- returning 2 out of 3

Clear the table

DELETE FROM [table_e] --3 rows affected

Materialize the view by creating a clustered index:

CREATE UNIQUE CLUSTERED INDEX [PK_iv_test] 
ON [dbo].[iv_test]([id] ASC)
         WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
               SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, 
               DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, 
               ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

Now try to add the same data as before

INSERT INTO [dbo].[table_e] ([id], [module], [event], [params])
VALUES (1, 'ModuleB', 'EventT', N'[{"type":"AccountId","value":"AccountX"},{"type":"AccountId","value":"AccountZ"},{"type":"Balance","value":10},{"type":"Balance","value":10}]'),
(2, 'ModuleB', 'EventT', N'[{"type":"AccountId","value":"AccountY"},{"type":"AccountId","value":"AccountX"},{"type":"Balance","value":20}]'),
(3, 'ModuleP', 'EventA', N'[{"type":"AccountId","value":"AccountZ"},{"type":"AccountId","value":"AccountY"},{"type":"Hash","value":"SomeHash"}]')

Result: INSERT does not work. Creation of the clustered index is blocking my INSERT with this error:

Msg 8115, Level 16, State 6, Line 105
Arithmetic overflow error converting nvarchar to data type numeric. The statement has been terminated.

What is happening here?

It looks like the where clause applied to the indexed view e.g. e.[module_id] = 'Balances' AND e.[event_id] = 'Transfer' is not applied.

As a result all data inserted into the base table is checked against the view [iv_test] while in my opinion, only the data inserted that meets the WHERE conditions as specified in the view [iv_test] should be checked against the format in the view [iv_test].

The funny thing is: The following approach does work:

  • First insert records with ID 1 & 2;
  • Then create clustered index;
  • Then insert record with ID 3;

This off course does not solve my problem, but makes it obvious there is some WITH NOCHECK constraint active for the indexed view after creating the clustered index.

Any thoughts are welcome

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user158936
  • 11
  • 2

1 Answers1

1

I have a "solution" for you, but not a full explanation.

In your indexed view definition, change the last two cast to try_cast:

CREATE VIEW [dbo].[iv_test]
WITH SCHEMABINDING
AS
SELECT 
    e.[id]
    ,CAST(JSON_VALUE(e.[params], '$[0].value') AS CHAR(66)) AS [AccountAddress_From]
    ,CAST(JSON_VALUE(e.[params], '$[1].value') AS CHAR(66)) AS [AccountAddress_To]
    ,TRY_CAST(JSON_VALUE(e.[params], '$[2].value') AS DECIMAL (36)) AS [Amount_Transferred]
    ,TRY_CAST(JSON_VALUE(e.[params], '$[3].value') AS DECIMAL (36)) AS [Amount_Fees]
FROM [dbo].[table_e] e 
WHERE e.[module] = 'ModuleB' AND e.[event] = 'EventT'
GO

I know what you're thinking:

But the where clause on the view definition should be filtering out the rows where the cast would fail, so these shouldn't be materialized, so the materialization should succeed

And yeah, logically I guess that's right... that's why it works when you create the index after the inserting the first rows (which will happily cast to decimal), and then add the third row.

So I guess the deduction to make is that there must be a difference in the order of operations performed by the engine when creating the filtered index on for multiple rows of data vs what it does when inserting one row into the table. Specifically, it seems the engine evaluates the view output for the data and then applies the predicate when deciding what to put into the index.

I bet, say, Paul White could provide details on exactly what's going on under the covers.

allmhuran
  • 4,154
  • 1
  • 8
  • 27
  • Yes, TRY_CAST is an interesting option. However, due to the fact that there can be an infinite combination of values / value types, I probably end up with an unmanageable CASE construction (CASE WHEN TRY_CAST IS NULL ...). So influencing the engine evaluation sequence (without creating unwanted side effects) would be my preferred choice. This might take some additional time investigating but hopefully provides a more durable solution. – user158936 Sep 06 '21 at 17:55
  • @user158936 if there's no way to know what the column types in the view are, then obviously you're not going to be able to create a view that tries to cast to any type that could possibly fail a conversion. So all of your view columns that are generated from the parsed json must be either `varchar`, or `sql_variant`, in which case you can just use `cast`. – allmhuran Sep 06 '21 at 18:02
  • Got an unexpected surprise. If I split the inserts and group them per module and event it works fine ...?! – user158936 Sep 06 '21 at 18:06
  • INSERT INTO [dbo].[table_e] ([id], [module], [event], [params]) VALUES (1, 'ModuleB', 'EventT', N'[{"type":"AccountId","value":"AccountX"},{"type":"AccountId","value":"AccountZ"},{"type":"Balance","value":10},{"type":"Balance","value":10}]'), (2, 'ModuleB', 'EventT', N'[{"type":"AccountId","value":"AccountY"},{"type":"AccountId","value":"AccountX"},{"type":"Balance","value":20}]') – user158936 Sep 06 '21 at 18:07
  • INSERT INTO [dbo].[table_e] ([id], [module], [event], [params]) VALUES (3, 'ModuleP', 'EventA', N'[{"type":"AccountId","value":"AccountZ"},{"type":"AccountId","value":"AccountY"},{"type":"Hash","value":"SomeHash"}]') – user158936 Sep 06 '21 at 18:07
  • So evaluation is per INSERT – user158936 Sep 06 '21 at 18:08
  • Anyone any additions? If not it seems you all have helped my brain to go in the right direction :-) – user158936 Sep 06 '21 at 18:09
  • @user158936 Actually I think would expect that. I bet it doesn't work if there's more than one row being inserted at the same time that can't be `cast`, though. – allmhuran Sep 06 '21 at 18:18
  • Yes, you are absolutely right. It needs some more investigation. – user158936 Sep 07 '21 at 07:09
  • "@user158936 if there's no way to know what the column types in the view are". I do know what the types are. It is just a matter of the engine that seems to evaluate the view output for the data and then applies the predicate when deciding what to put into the index instead of first applying the predicate like you mentioned. – user158936 Sep 07 '21 at 07:13
  • @user158936 I don't think I'm following. You said you didn't want to use the `try_cast` solution because it would require an unmanageable case construction because of the number of different possible types. But if you know what the type needs to be - for example, those two columns in your question need to be `decimal(36)` - then you don't need a `case`, just `try_cast` to `decimal(36)`, no? – allmhuran Sep 07 '21 at 09:16
  • Yes, try_cast it will be. I seems I am unable to influence the executionplan in the way I need. So yes. Thank you! – user158936 Sep 07 '21 at 18:19