As this question did raise quite some discussion in comments but did not get any viable answer, I'd like to add the major points in order to help future research.
This question is about: How do I pass a (large) list of values into a query?
In most cases, people need this either in a WHERE SomeColumn IN(SomeValueList)
-filter or to JOIN
against this with something like FROM MyTable INNER JOIN SomeValueList ON...
.
Very important is the SQL-Server's version, as with v2016 we got two great tools: native STRING_SPLIT()
(not position-safe!) and JSON support.
Furthermore, and rather obvious, we have to think about the scales and values.
- Do we pass in a simple list of some IDs or a huge list with thousands of values?
- Do we talk about simple integers or GUIDs?
- And what's about text values, where we have to think about dangerous characters (like
[ { "
in JSON or < &
in XML - there are many more...)?
- What about CSV-lists, where the separating character might appear within the content (quoting / escaping)?
- In some cases we might even want to pass several columns at once...
There are several options:
- Table valued parameter (TVP,
CREATE TYPE ...
),
- CSV together with string splitting functions (native since v2016, various home brewed, CLR...),
- and text-based containers: XML or JSON (since v2016)
Table valued paramter (TVP - the best choice)
A table valued parameter (TVP) must be created in advance (this might be a draw back) but will behave as any other table once created. You can add indexes, you can use it in various use cases and you do not have to bother about anything under the hood.
Sometimes we cannot use this due to missing rights to use CREATE TYPE
...
Character separated values (CSV)
With CSV we see three approaches
Dynamic Sql: Create a statement, where the CSV list is simply stuffed into the IN()
and execute this dynamically. This can be a very efficient approach, but will be open to various obstacles (no ad-hoc-usage, injection threat, breaking on bad values...)
String splitting functions: There are tons of examples around... All of them have in common that the separated string will be returned as a list of items. Common issues here: performance, missing ordinal position, limits for the separator, handling of duplicate or empty values, handling of quoted or escaped values, handling of separators within the content. Aaron Bertrand did some great research about the various approaches of string splitting. Similar to TVPs one draw back might be, that this function must exist in the database in advance or that we need to be allowed to execute CREATE FUNCTION
if not.
ad-hoc-splitters: Before v2016 the most used approach was XML based, since then we have moved to JSON based splitters. Both use some string methods to transform the CSV string to 1) separated elements (XML) or 2) into a JSON-array. The result is queried by 1) XQuery (.value()
and .nodes()
) or 2) JSON's OPENJSON()
or JSON_VALUE()
.
Text based containers
We can pass the list as string, but within a defined format:
- Using
["a","b","c"]
instead of a,b,c
allows for immediate usage of OPENJSON()
.
- Using
<x>a</x><x>b</x><x>c</x>
instead allows for XML queries.
The biggest advantage here: Any programming language provides support for these formats.
Common obstacles like date and number formatting is solved implicitly. Passing JSON or XML is - in most cases - just some few lines of code.
Both approaches allow for type- and position-safe queries.
We can solve our needs without the need to rely on anything existing in advance.