3

I'm storing string prefixes in my SQL Server table, I'm wanting to see if any of those values are a valid prefix for a given parameter value.

e.g. supposing I have a telephone do-not-call list and it includes an entry to prohibit all phone calls to numbers starting with "1425123", rather than inserting 10000 numbers (14251230000 to 14251239999) it stores the prefix instead.

Like so:

CREATE TABLE Prefixes (
    Value varchar(10)
)

CREATE INDEX IX_Value UNIQUE Prefixes ( Value )

Evaluated like so:

DECLARE @value varchar(10) = 'foobar'

SELECT
    *
FROM
    Prefixes
WHERE
    @value LIKE ( Value + '%' );

When I run this in Azure SQL in SQL Server Management Studio it says it's performing an Index Scan. With approximately 70,000 entries on an Azure SQL S1 database the query takes between 200 and 500ms to execute. The tooling does not suggest any improvements to indexes for faster performance.

For comparison, an exact equality match (Value = @value) uses an Index Seek and happens near instantly.

200-500ms is too slow for my application.

One option is to move the lookup into my application code using a Trie for an efficient prefix search (which introduces synchronization problems), but another approach is to change the query to something like this:

DECLARE @v1 varchar(1) = LEFT( @value, 1 )
DECLARE @v2 varchar(2) = LEFT( @value, 2 )
DECLARE @v3 varchar(3) = LEFT( @value, 3 )
DECLARE @v4 varchar(4) = LEFT( @value, 4 )
DECLARE @v5 varchar(5) = LEFT( @value, 5 )
DECLARE @v6 varchar(6) = LEFT( @value, 6 )
DECLARE @v7 varchar(7) = LEFT( @value, 7 )
DECLARE @v8 varchar(8) = LEFT( @value, 8 )
DECLARE @v9 varchar(9) = LEFT( @value, 9 )

SELECT
    *
FROM
    Prefixes
WHERE
    Value = @v1 OR
    Value = @v2 OR
    Value = @v3 OR
    Value = @v4 OR
    Value = @v5 OR
    Value = @v6 OR
    Value = @v7 OR
    Value = @v8 OR
    Value = @v9

When I run this, it's a lot quicker (using an Index Seek) but it feels like a hack, but because I know the length is less than 10 characters I'm okay with it... for now.

Is there a better way? Is there a way SQL Server could do my prefix matching internally (i.e. using the same logic in my final example but without using repetitive and brittle SQL)?

Dai
  • 141,631
  • 28
  • 261
  • 374
  • 1
    Is 70,000 entries really realistic? How many of those are full numbers and how many are prefixes. Also how many entries are redundant? E.g. If Prefixes contains `'1'`, then there could be hundreds or even thousands of entries starting with `1` that would be ignored because `'1'` is already a match for `'14251230000'`. – Disillusioned Feb 05 '17 at 08:25

2 Answers2

2

This is something an auxiliary numbers table can help with.

As you only need 1-10 I've made one inline in the query rather than assuming that one exists.

You could shorten the code by replacing the derived table V with a reference to a permanent numbers table if you have one or can create one.

SELECT IIF(EXISTS (SELECT *
                   FROM   (VALUES(1),(2),(3),
                                 (4),(5),(6),
                                 (7),(8),(9),(10)
                          ) V(number)
                          JOIN Prefixes P WITH(FORCESEEK)
                            ON P.Value = LEFT(@value, number)
                   WHERE  number <= LEN(@value)), 1, 0) AS PrefixExists 

enter image description here

  |--Compute Scalar(DEFINE:([Expr1014]=CASE WHEN [Expr1015] THEN (1) ELSE (0) END))
       |--Nested Loops(Left Semi Join, DEFINE:([Expr1015] = [PROBE VALUE]))
            |--Constant Scan
            |--Nested Loops(Inner Join, OUTER REFERENCES:([Union1010]))
                 |--Filter(WHERE:([Union1010]<=len([@value])))
                 |    |--Constant Scan(VALUES:(((1)),((2)),((3)),((4)),((5)),((6)),((7)),((8)),((9)),((10))))
                 |--Index Seek(OBJECT:([tempdb].[dbo].[Prefixes].[IX_Value] AS [P]), SEEK:([P].[Value]=substring([@value],(1),[Union1010])) ORDERED FORWARD)
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
1

The reason your first option is slow is that it's not sargable, because you're modifying Prefixes.Value in your where clause.

As a result it is impossible to leverage the index.

Your proposed solution is fine (though you're missing a prefix of length 10).

The only thing I'd point out is that surely you'd rather use an EXISTS query? Once you've found a single match then you're done; no need to find more. Also IN is more succint.

I.e.

IF EXISTS (
    SELECT  *
    FROM    Prefixes
    WHERE   Value IN (@v1, @v2, ...)
)
    RETURN 1
ELSE
    RETURN 0

PS If it's really important, you could look into using Full Text Indexing. (Unfortunately I've never used it myself, so cannot help further.) I know it's more work, but may be justified. It used to require running an additional service; and I don't know if that's still the case.


EDIT

Borrowing from Dudu Markovitz's idea which would still be inefficient if for example:

  • @Value = '9999999999' and does not match any prefixes.
  • The reason is that all Prefixes.Value < '9999999999'.
  • But none match the filter @value like value + '%'.
  • So the query would still have to scan all rows.

However I do think this can (with some tweaks) be made efficient by always getting the first value < @value and then checking if that specifically matches @value like value + '%'. The catch is you need to first guarantee Prefixes doesn't contain any "redundant" values (or at least that redundant values can easily be filtered out using a flag).

By redundant I mean any Value that would itself not be valid because it starts with an existing shorter prefix.

You could then use the following query.

SELECT  *
FROM    (
        SELECT  TOP 1 Value as PossiblePrefix
        FROM    Prefixes
                /* WHERE can leverage index;
                   but requires NO redundant Prefixes.Value rows
                   so that it returns only ONE possible prefix that
                   has a chance of matching @Value.*/
        WHERE   Value <= @Value
        ORDER BY Value DESC
        ) pp
WHERE   @Value LIKE pp.PosisblePrefix + '%'

If the optimiser fails to choose the correct index, this would be one of the rare situations I would advocate using an index hint.

Community
  • 1
  • 1
Disillusioned
  • 14,635
  • 3
  • 43
  • 77