Can using multiple JSON_VALUE-s in MSSQL query harm performance

Question

I have table in sql server database which have column of json type.

Table - SomeTable
Id | Properties
1  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
2  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
...|...

I wrote select query which selects value of each field separately:

SELECT
    JSON_VALUE(Properties, '$.field1') as field1,
    JSON_VALUE(Properties, '$.field2') as field2,
    JSON_VALUE(Properties, '$.field3') as field3,
    JSON_VALUE(Properties, '$.field4') as field4
FROM SomeTable

I found this approach in Microsoft's docs (https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15)

Can writing many JSON_VALUE-s in query harm performance ? Is SQL doing deserialization of string for each JSON_VALUE written in query.

I'm pretty sure the answer is "yes". But it is easy enough to check by timing simple queries (on enough data) with different numbers of references. — Gordon Linoff, Jul 29 '20 at 12:47
Is there better way for getting json fields separately using sql server ? — Tigran Petrosyan, Jul 29 '20 at 12:51

Zhorov · Accepted Answer · 2020-07-29T13:00:40.187

You may try to use OPENJSON() with an explicit schema to parse the JSON stored in the Properties column with one function call (against four or more JSON_VALUE() calls):

Table:

CREATE TABLE SomeTable (
   Id int,
   Properties varchar(1000)
)
INSERT INTO SomeTable (Id, Properties)
VALUES
   (1, '{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}'),
   (2, '{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}')

Statement:

SELECT s.Id, j.*
FROM SomeTable s
CROSS APPLY OPENJSON(s.Properties) WITH (
   field1 varchar(100) '$.field1',
   field2 varchar(100) '$.field2',
   field3 varchar(100) '$.field3',
   field4 varchar(100) '$.field4'
) j

Result:

Id  field1  field2  field3  field4
----------------------------------
1   value1  value2  value3  value4
2   value1  value2  value3  value4

As an additional note, the result from the JSON_VALUE() is a scalar value of type nvarchar(4000). With OPENJSON() and explicit schema you may define the appropriate data type for the returned columns.

@TigranPetrosyan, OPENJSON is better if you want to select multiple columns. See the performance comparison done in the SO link: https://stackoverflow.com/questions/62506490/t-sql-to-get-the-5th-word-in-each-row-in-a-database/62508469#62508469 — Venkataraman R, Jul 29 '20 at 13:03

score 6 · Answer 2 · answered Jul 29 '20 at 13:02

I thought I would do a test in regards to your question:

Can writing many JSON_VALUE-s in query harm performance ? Is SQL doing deserialization of string for each JSON_VALUE written in query.

So I create the below script and tested on my home work station:

/*
Id | Properties
1  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
2  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
*/

CREATE TABLE dbo.YourTable (ID int IDENTITY,
                            Properties nvarchar(4000));

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2, N N3, N N4, N N5, N N6)
INSERT INTO dbo.YourTable (Properties)
SELECT N'{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}'
FROM Tally;
GO

DECLARE @StartTime datetime2(7) = SYSDATETIME();

SELECT
    JSON_VALUE(Properties, '$.field1') as field1,
    JSON_VALUE(Properties, '$.field2') as field2,
    JSON_VALUE(Properties, '$.field3') as field3,
    JSON_VALUE(Properties, '$.field4') as field4
FROM dbo.YourTable;

PRINT DATEDIFF(MILLISECOND,@StartTime, SYSDATETIME());

SET @StartTime = SYSDATETIME();

SELECT YT.Id,
       OJ.field1,
       OJ.field2,
       OJ.field3,
       OJ.field4
FROM dbo.YourTable YT
     CROSS APPLY OPENJSON(YT.Properties)
                 WITH(field1 varchar(50),
                      field2 varchar(50),
                      field3 varchar(50),
                      field4 varchar(50)) OJ;

PRINT DATEDIFF(MILLISECOND,@StartTime, SYSDATETIME());

GO 10

DROP TABLE dbo.YourTable

This uses your solution, with JSON_VALUE and also an OPENJSON solution (which I'd written around the same time as Zhorov posted his answer). On my workstation, this resulted in the below output from the messages:

(1000000 rows affected)
5273
(1000000 rows affected)
3560
(1000000 rows affected)
5196
(1000000 rows affected)
3329
(1000000 rows affected)
5097
(1000000 rows affected)
3320
(1000000 rows affected)
5219
(1000000 rows affected)
3379
(1000000 rows affected)
5133
(1000000 rows affected)
3239
(1000000 rows affected)
5137
(1000000 rows affected)
3352
(1000000 rows affected)
5080
(1000000 rows affected)
3348
(1000000 rows affected)
5126
(1000000 rows affected)
3320
(1000000 rows affected)
5146
(1000000 rows affected)
3360
(1000000 rows affected)
5310
(1000000 rows affected)
3354

Notice that the second row, with OPENJSON, is about 2 second faster for every run. This is, however, with 1,000,000 million rows, but it does confirm that a single parse of the JSON is far faster than 4.

Can using multiple JSON_VALUE-s in MSSQL query harm performance

2 Answers2

Linked