3

I have a stored procedure it takes comma separated string as input. Which might be too large some times approximately more than 8 thousand characters or more. In that situation, query performance goes down sometimes. And I think there is a limitation for the character length inside the IN clause. For that, sometimes I get errors. Now, I need to know is it better to use a Custom TABLE TYPE as parameter and use Inner JOIN to find the result. If it is then why is it. Here are my 2 stored procedures (minimal code):

CREATE TYPE [dbo].[INTList] AS TABLE(
    [ID] [int] NULL
)

Procedure 1

CREATE PROCEDURE [report].[GetSKU]   
  @list [INTList] READONLY,         
AS 

Select sk.SKUID,sk.Code SCode,sk.SName
FROM SKUs sk
INNER JOIN @list sst ON sst.ID=sk.SKUID

Procedure 2

CREATE PROCEDURE [report].[GetSKU]   
  @params varchar(max),         
AS 
Select sk.SKUID,sk.Code SCode,sk.SName
FROM SKUs sk
WHere CHARINDEX(','+cast( sk.SKUID as varchar(MAX))+',', @params) > 0

Now, which procedures is better to use.

Note: Original Stored Procedures does have few more Joins.

Dale K
  • 25,246
  • 15
  • 42
  • 71
Srijon Chakraborty
  • 2,007
  • 2
  • 7
  • 20
  • So you question is; why is passing a list of values better than passing a comma separated string? – Dale K Nov 18 '21 at 05:22
  • @DaleK Yes. You got my question right. I just need to know which process should I select and why? What can be the limitation if I use the procedure 1 (if any). Or you can suggest any other process which might be better than these two. Thank you. – Srijon Chakraborty Nov 18 '21 at 05:28
  • 3
    TVP is definitely preferable to awful string splitting. (Also your second example needs to also surround `@params` with `','` on each side. It's ugly and can never use an index, whereas the join to a table type can.) But as is your question isn't really a good form for this site... what does "better" mean? Does it mean the same thing to you as it does to me? – Aaron Bertrand Nov 18 '21 at 05:40
  • 3
    Procedure 1 is the correct, proper, best practice way to do it. – Dale K Nov 18 '21 at 05:47
  • @AaronBertrand I am sorry for that you found my question problematic or good form, however, I need to know which one is better and why and what can be the limitations. Thank you :) – Srijon Chakraborty Nov 18 '21 at 05:57
  • @DaleK Thank you. Is there any limitation in the procedure One ? – Srijon Chakraborty Nov 18 '21 at 06:00
  • Probably... everything has limitations... but I've never had any issues with it. And here are a lot of known limitations with procedure 2. – Dale K Nov 18 '21 at 06:03
  • 2
    [Erland Sommarskog says](https://www.sommarskog.se/arrays-in-sql.html#performance) the best solution is a TVP, and I'd be surprised if anyone can prove differently – Charlieface Nov 18 '21 at 07:52
  • 1
    One limitation with a TVP (which is the *best* solution in an academic point of view) is: It must exist. Sometimes we work against various databases and it might be easier not to rely on anything existing there... A very fast approach (needs v2016+) is ['STRING_SPLIT()'](https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15), but this is **not position safe**!, or [JSON-splitting](https://stackoverflow.com/a/38275075/5089204). Both can return a list you can use within `IN()` or you can JOIN against. – Shnugo Nov 18 '21 at 08:46
  • @Shnugo `STRING_SPLIT()` is position safe in Azure SQL DB, Azure SQL MI, and will be in SQL Server 2022 ([I wrote about the change here](https://www.mssqltips.com/sqlservertip/7039/stringsplit-function-azure-sql-database-ordinal-position/)), but 90% of the time position within the string is irrelevant. – Aaron Bertrand Nov 18 '21 at 19:04
  • @AaronBertrand probably right, but I had quite a lot of cases where position is important. Did you see the position *and type safe* approach with doubled array brackets and a With-clause (link in my comment above)? I used this for easy csv imports. Quite often we find two or more list which are to be joined by their position (like names and values). – Shnugo Nov 18 '21 at 19:17
  • @Shnugo Yes I wrote about something similar recently but I only just now saw your addendum to the above answer ([article here](https://www.mssqltips.com/sqlservertip/7034/sql-split-string-with-openjson/)). Also, yes, my 90% is anecdotal. I visited a _lot_ of code recently in the Wayfair code base and that's where my 90% comes from. Almost everything I saw was `WHERE col IN dbo.CLR_Split(@List, ',')`. – Aaron Bertrand Nov 18 '21 at 19:53
  • @AaronBertrand well, yeah, but such a function (as the TVP above) must exist in advance... – Shnugo Nov 18 '21 at 20:48
  • TVPs can also have primary key constraints added to them - which may further benefit performance/data integrity in cases where the rows have a pk. – Moe Sisko Nov 19 '21 at 02:29

2 Answers2

2

As this question did raise quite some discussion in comments but did not get any viable answer, I'd like to add the major points in order to help future research.

This question is about: How do I pass a (large) list of values into a query?

In most cases, people need this either in a WHERE SomeColumn IN(SomeValueList)-filter or to JOIN against this with something like FROM MyTable INNER JOIN SomeValueList ON....

Very important is the SQL-Server's version, as with v2016 we got two great tools: native STRING_SPLIT() (not position-safe!) and JSON support.

Furthermore, and rather obvious, we have to think about the scales and values.

  • Do we pass in a simple list of some IDs or a huge list with thousands of values?
  • Do we talk about simple integers or GUIDs?
  • And what's about text values, where we have to think about dangerous characters (like [ { " in JSON or < & in XML - there are many more...)?
  • What about CSV-lists, where the separating character might appear within the content (quoting / escaping)?
  • In some cases we might even want to pass several columns at once...

There are several options:

  • Table valued parameter (TVP, CREATE TYPE ...),
  • CSV together with string splitting functions (native since v2016, various home brewed, CLR...),
  • and text-based containers: XML or JSON (since v2016)

Table valued paramter (TVP - the best choice)

A table valued parameter (TVP) must be created in advance (this might be a draw back) but will behave as any other table once created. You can add indexes, you can use it in various use cases and you do not have to bother about anything under the hood.
Sometimes we cannot use this due to missing rights to use CREATE TYPE...

Character separated values (CSV)

With CSV we see three approaches

  • Dynamic Sql: Create a statement, where the CSV list is simply stuffed into the IN() and execute this dynamically. This can be a very efficient approach, but will be open to various obstacles (no ad-hoc-usage, injection threat, breaking on bad values...)

  • String splitting functions: There are tons of examples around... All of them have in common that the separated string will be returned as a list of items. Common issues here: performance, missing ordinal position, limits for the separator, handling of duplicate or empty values, handling of quoted or escaped values, handling of separators within the content. Aaron Bertrand did some great research about the various approaches of string splitting. Similar to TVPs one draw back might be, that this function must exist in the database in advance or that we need to be allowed to execute CREATE FUNCTION if not.

  • ad-hoc-splitters: Before v2016 the most used approach was XML based, since then we have moved to JSON based splitters. Both use some string methods to transform the CSV string to 1) separated elements (XML) or 2) into a JSON-array. The result is queried by 1) XQuery (.value() and .nodes()) or 2) JSON's OPENJSON() or JSON_VALUE().

Text based containers

We can pass the list as string, but within a defined format:

  • Using ["a","b","c"] instead of a,b,c allows for immediate usage of OPENJSON().
  • Using <x>a</x><x>b</x><x>c</x> instead allows for XML queries.

The biggest advantage here: Any programming language provides support for these formats.
Common obstacles like date and number formatting is solved implicitly. Passing JSON or XML is - in most cases - just some few lines of code.
Both approaches allow for type- and position-safe queries.
We can solve our needs without the need to rely on anything existing in advance.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
-4

For the very best performance you can use this function:

CREATE FUNCTION [dbo].StringSplit
(
    @String  VARCHAR(MAX), @Separator CHAR(1)
)
RETURNS @RESULT TABLE(Value VARCHAR(MAX))
AS
BEGIN     
 DECLARE @SeparatorPosition INT = CHARINDEX(@Separator, @String ),
        @Value VARCHAR(MAX), @StartPosition INT = 1
 
 IF @SeparatorPosition = 0  
  BEGIN
   INSERT INTO @RESULT VALUES(@String)
   RETURN
  END
     
 SET @String = @String + @Separator
 WHILE @SeparatorPosition > 0
  BEGIN
   SET @Value = SUBSTRING(@String , @StartPosition, @SeparatorPosition- @StartPosition)
 
   IF( @Value <> ''  ) 
    INSERT INTO @RESULT VALUES(@Value)
   
   SET @StartPosition = @SeparatorPosition + 1
   SET @SeparatorPosition = CHARINDEX(@Separator, @String , @StartPosition)
  END    
     
 RETURN
END

This function return table - select * from StringSplit('12,13,14,15,16', ',') so you can join this function to your table or can use IN on the where clause.

Ramin Faracov
  • 3,032
  • 1
  • 2
  • 8
  • 1
    On what basis do you suggest this performs better than using a table valued parameter to pass the values in? – Dale K Nov 18 '21 at 06:59
  • I tested it for 2000 characters, your procedure takes 22 seconds, but my function takes 7 milliseconds. Your procedure performance is slow because you are using CHARINDEX and parsing @params on the where clause of the select query, therefore DB can not use the index, and your loop process used the selection data process. My function don't use selection data from the disk, and don't use loop using select data. – Ramin Faracov Nov 18 '21 at 08:22
  • 1
    @RaminFaracov The combination of a *mulit-statement* table valued function (a **very bad** choice in almost all cases) with a `WHILE`-loop (an **awfull choice** in terms of performance) will - for sure! - not bring the *very best performance*... Run this once on a rather short list? You can take whatever you want. Run this a million times with large sets? Get rid of this approach. Read about fast string splitters [here (Aaron Bertrand)](https://sqlperformance.com/2021/09/t-sql-queries/split-strings) and - from v2016 -[read about JSON-splitters](https://stackoverflow.com/a/38275075/5089204) – Shnugo Nov 18 '21 at 08:32
  • "Your procedure performance is slow because you are using CHARINDEX" - there is no charindex in the table valued parameter procedure? Procedure 1? – Dale K Nov 18 '21 at 08:38
  • @RaminFaracov, one hint: Without a given sorting (e.g. an `IDENTITY` column) your function might even break up the item order. This might not be important for the given use case, but is a real draw back (like with the existing `STRING_SPLIT()` function (v2016+)). – Shnugo Nov 18 '21 at 09:51