0

I have a simple query in T-SQL:

SELECT
  *
FROM
  Table t
WHERE
  t.Column IN ( 'Value1', 'Value2', 'Value3', ..., 'ValueN' )
;

Of course, the query is actually much more complex with a couple of JOINs and subqueries, but that does not matter at the moment.

The question is: Which of the following is faster in terms of performance?

(1) The original condition of

t.Column IN ( 'Value1', 'Value2', 'Value3', ..., 'ValueN' )

(2) Using a table ValueEnumeration with just one column named Value (being possibly a primary key), the table being populated with the values 'Value1', 'Value2', ...

SELECT
  *
FROM
  Table t
WHERE
  t.Column in ( SELECT ve.Value FROM ValueEnumeration ve )
;

(3) Using a user defined function (UDF), a scalar function to be precise, called IsAllowedValue.

The function:

CREATE FUNCTION dbo.IsAllowedValue ( @ValueToCheck VARCHAR(20) ) RETURNS INT
AS
BEGIN
  IF @ValueToCheck = 'Value1'
     OR @ValueToCheck = 'Value2'
     OR @ValueToCheck = 'Value3'
     ...
     OR @ValueToCheck = 'ValueN'
  BEGIN
    RETURN 1;
  END;
  RETURN 0;
END
;

The query:

SELECT
  *
FROM
  Table t
WHERE
  dbo.IsAllowedValue(t.Column) = 1
;

Well, I guess the first one will be the quickest solution. However, I need to perform a similar check in many places in my stored procedures. Once the original enumeration of values changes in the future (which is very likely to happen - e.g. a new value has to be added to it), you will have to go to all the occurrences of the original condition in your code and add the new value in there. Therefore, I decided for a more re-usable solution. But I don't know which one to choose. I sometimes need to do a test the other way around (WHERE t.Column NOT IN (...)). It also came to my mind to use the table ValueEnumeration in an INNER JOIN (for positive checks) or a LEFT OUTER JOIN (for negative checks), but this will be painful to implement since I have approx. 50 locations of such conditions in the code and, in general, adding a JOIN considerably changes the look of the SQL query as well as the execution plan, the latter not always for good.

Do you have any idea?

SynozeN Technologies
  • 1,337
  • 1
  • 14
  • 19
Pavel Foltyn
  • 175
  • 3
  • 13
  • I suppose 2 will be fastest for `n > 20` and 3 will be slowest (because functions are slow and require full table scan). Though - claims about performance are inherently flawed and I recommend to check it on your own. – Ginden May 30 '17 at 08:45
  • 2
    [Race your horses.](https://ericlippert.com/2012/12/17/performance-rant/). Intuitively, I would say the first option is the best bot from the readability and performance perspective. – Zohar Peled May 30 '17 at 08:48
  • @Ginden I have currently some 15 values in the table ValueEnumeration. Thanks for your comment. And you're right. I'll have to try all the options on my own... – Pavel Foltyn May 30 '17 at 09:37
  • Do you have an `INDEX` on `column`? – Rick James Jul 24 '17 at 03:47

2 Answers2

0

Solution 2 ist fine as long as you store the allowed values before invoking the query. This will not impact your performance (you will get the values once, not for every record in a table) and will be more reusable than Solution 1.

declare @AllowedValues table(val varchar(...))

insert into @AllowedValues 
SELECT ve.Value FROM ValueEnumeration ve

Then you can use it in your code:

......
WHERE
  t.Column in ( SELECT val FROM @AllowedValues )
Jan
  • 127
  • 1
  • 10
  • Jan, your syntax `t.Column in ( @AllowedValues )` does not work for T-SQL (**Must declare the scalar variable "@AllowedValues"**). Do you mean `t.Column in ( SELECT val FROM @AllowedValues )` ? – Pavel Foltyn May 30 '17 at 09:35
  • Ok, thanks. But I don't see the difference between `SELECT val FROM @AllowedValues` and `SELECT Value FROM ValueEnumeration`. Do you suppose table variables with a few values (< 20 rows) better performing than ordinary tables? – Pavel Foltyn May 30 '17 at 09:54
  • Yes, I do. I have no possibility to carefully test it right now, but take a look there for example https://stackoverflow.com/questions/11857789/when-should-i-use-a-table-variable-vs-temporary-table-in-sql-server – Jan May 30 '17 at 09:56
0

Well, I ultimately decided for the 3rd (generally not recommended) solution (creating a UDF). It appears to be O.K. in terms of performance. Or, at least, it is not slower than the 2nd solution (a table of the "allowed" values).

A function, although generally considered to be a bottleneck of many SQL queries, provides a couple of advantages:

(i) It is re-usable and easy to adjust (if some new values have to be added in the future, for example).

(ii) Unlike a table of enumerated values, whenever you see DDL of the function, the function definition, you can look at the values (the constants) currently in use (the advantage of the 1st solution which is not re-usable, though). If you used a table, you would have to execute a SELECT to check for values currently in place.

(iii) Even syntactically, it is easier to write

dbo.IsAllowedValue(t.Column) = 1

than

t.Column IN (SELECT Value FROM ValueEnumeration)

I'll provide more comments on the topic if any bad/good experience comes to me in the future.

Pavel Foltyn
  • 175
  • 3
  • 13