I would like to create a subquery that produces a list of numbers as a single-column result, something like MindLoggedOut did here but without the @x
xml variable, so that it can be appended to a WHERE
expression as a pure string (subquery) without sql parameters. The problem is that the replacement of the parameter (or variable) makes the query run 5000 times slower, and I don't understand why. What causes this big difference?
Example:
/* Create a minimalistic xml like <b><a>78</a><a>91</a>...</b> */
DECLARE @p_str VARCHAR(MAX) =
'78 91 01 12 34 56 78 91 01 12 34 56 78 91 01 12 34 56';
DECLARE @p_xml XML = CONVERT(XML,
'<b><a>'+REPLACE(@p_str,' ','</a><a>')+'</a></b>'
);
SELECT a.value('(child::text())[1]','INT')
FROM (VALUES (@p_xml)) AS t(x)
CROSS APPLY x.nodes('//a') AS x(a);
This returns one number per row and is quite fast (20x faster than the string-splitter approaches I was using so far, similar to these.
I measured the 20x speed-up in terms of sql server CPU time, with @p_str
containing 3000 numbers.)
Now if I inline the definition of @p_xml
into the query:
SELECT a.value('(child::text())[1]','INT')
FROM (VALUES (CONVERT(XML,
'<b><a>'+REPLACE(@p_str,' ','</a><a>')+'</a></b>'
))) AS t(x)
CROSS APPLY x.nodes('//a') AS x(a);
then it becames 5000x slower (when @p_str
contains thousands of numbers.) Looking at the query plan I cannot find the reason for it.
Plan of the first query (
…VALUES(@p_xml)…
), and the second (…VALUES(CONVERT(XML,'...'))…
)
Could somebody shed some light on it?
UPDATE
Clearly the plan of the first query doesn't include the cost
of the @p_xml = CONVERT(XML, ...REPLACE(...)... )
assignment, but this
cost is not the culprit that could explain the 46ms vs. 234sec
difference between the execution time of the whole script (when
@p_str
is large). This difference is systematic (not random)
and was in fact observed in SqlAzure (S1 tier).
Furthermore, when I rewrote the query: replacing CONVERT(XML,...)
by a user-defined scalar function:
SELECT a.value('(child::text())[1]','INT')
FROM (VALUES (dbo.MyConvertToXmlFunc(
'<b><a>'+REPLACE(@p_str,' ','</a><a>')+'</a></b>'
))) AS t(x)
CROSS APPLY x.nodes('//a') AS x(a);
where dbo.MyConvertToXmlFunc()
is:
CREATE FUNCTION dbo.MyConvertToXmlFunc(@p_str NVARCHAR(MAX))
RETURNS XML BEGIN
RETURN CONVERT(XML, @p_str);
END;
the difference disappeared (plan). So at least I have a workaround... but would like to understand it.