SQL Server: Select rows with multiple occurrences of regex match in a column

Question

I’m fairly used to using MySQL, but not particularly familiar with SQL Server. Tough luck, the database I’m dealing with here is on SQL Server 2014.

I have a table with a column whose values are all integers with leading, separating, and trailing semicolons, like these three fictitious rows:

;905;1493;384;13387;29;933;467;28732;
;905;138;3084;1387;290;9353;4767;2732;
;9085;14493;3864;130387;289;933;4767;28732;

What I am trying to do now is to select all rows where more than one number taken from a list of numbers appears in this column. So for example, given the three rows above, if I have the group 905,467,4767, the statement I’m trying to figure out how to construct should return the first two rows: the first row contains 905 and 467; the second row contains 905 and 4767. The third row contains only 4767, so that row should not be returned.

As far as I can tell, SQL Server does not actually support regex directly (and I don’t even know what managed code is), which doesn’t help. Even with regex, I wouldn’t know where to begin. Oracle seems to have a function that would be very useful, but that’s Oracle.

Most similar questions on here deal with finding multiple instances of the same character (usually singular) and solve the problem by replacing the string to match with nothing and counting the difference in length. I suppose that would technically work here, too, but given a ‘filter’ group of 15 numbers, the SELECT statement would become ridiculously long and convoluted and utterly unreadable. Additionally, I only want to match entire numbers (so if one of the numbers to match is 29, the value 29 would match in the first row, but the value 290 in the second row should not match), which means I’d have to include the semicolons in the REPLACE clause and then discount them when calculating the length. A complete mess.

What I would ideally like to do is something like this:

SELECT * FROM table WHERE REGEXP_COUNT(column, ';(905|467|4767);') > 1

– but that will obviously not work, for all kinds of reasons (the most obvious one being the nonexistence of REGEXP_COUNT outside Oracle).

Is there some sane, manageable way of doing this?

I suppose the moral is to use junction tables and not store lists of *numbers* in *strings*. — Gordon Linoff, Oct 30 '17 at 17:41
@GordonLinoff I agree completely! At some point, I do hope to flesh this out into a separate pivot table, but sadly this is the way it’s been for over a decade, and there are myriad places I’ll have to scour to find all the instances where this is referred to in ASP files strewn around the server. :-/ — Janus Bahs Jacquet, Oct 30 '17 at 17:45

Martin Smith · Accepted Answer · 2017-10-31T07:45:06.833

2

You can do

SELECT *
FROM   Mess
       CROSS APPLY (SELECT COUNT(*)
                    FROM   (VALUES (905),
                                   (467),
                                   (4767)) V(Num)
                    WHERE  Col LIKE CONCAT('%;', Num, ';%')) ca(count)
WHERE  count > 1

SQL Fiddle

Or alternatively

WITH Nums
     AS (SELECT Num
         FROM   (VALUES (905),
                        (467),
                        (4767)) V(Num))
SELECT Mess.*
FROM   Mess
       CROSS APPLY (VALUES(CAST(CONCAT('<x>', REPLACE(Col, ';', '</x><x>'), '</x>') AS XML))) x(x)
       CROSS APPLY (SELECT COUNT(*)
                    FROM   (SELECT n.value('.', 'int')
                            FROM   x.x.nodes('/x') n(n)
                            WHERE  n.value('.', 'varchar') <> ''
                            INTERSECT
                            SELECT Num
                            FROM   Nums) T(count)
                    HAVING COUNT(*) > 1) ca2(count)

edited Oct 31 '17 at 07:45

answered Oct 30 '17 at 17:57

Martin Smith

438,706
87
741
845

There are several functions in this that I have never seen before, so I don’t understand quite what’s really going on—but this seems to work perfectly, and without having to declare variables and temp tables, too. Excellent! – Janus Bahs Jacquet Oct 30 '17 at 18:03
@JanusBahsJacquet the `VALUES` clause is a [table value constructor](https://learn.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql) acting as a derived table holding the three (in this case) numbers. For each outer row the `CROSS APPLY` counts how many of those are in the string. Then only ones with the desired count are preserved by the `WHERE` – Martin Smith Oct 30 '17 at 18:06

Chris Mack · Answer 2 · 2017-10-31T14:12:39.147

Could you put your arguments into a table (perhaps using a table-valued function accepting a string (of comma-separated integers) as a parameter) and use something like this?

DECLARE @T table (String varchar(255))
INSERT INTO @T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')

DECLARE @Arguments table (Arg int)
INSERT INTO @Arguments
VALUES
(905)
, (467)
, (4767)

SELECT String
FROM
    @T
    CROSS JOIN @Arguments
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1

And example of using this with a function to generate the arguments:

CREATE FUNCTION GenerateArguments (@Integers varchar(255))
RETURNS @Arguments table (Arg int)
AS

BEGIN

    WITH cte
    AS
    (
        SELECT
            PATINDEX('%,%', @Integers) p
            , LEFT(@Integers, PATINDEX('%,%', @Integers) - 1) n
        UNION ALL
        SELECT
            CASE WHEN PATINDEX('%,%', SUBSTRING(@Integers, p + 1, LEN(@Integers))) + p = p THEN 0 ELSE PATINDEX('%,%', SUBSTRING(@Integers, p + 1, LEN(@Integers))) + p END
            , CASE WHEN PATINDEX('%,%', SUBSTRING(@Integers, p + 1, LEN(@Integers))) = 0 THEN RIGHT(@Integers, PATINDEX('%,%', REVERSE(@Integers)) - 1) ELSE LEFT(SUBSTRING(@Integers, p + 1, LEN(@Integers)), PATINDEX('%,%', SUBSTRING(@Integers, p + 1, LEN(@Integers))) - 1) END
        FROM cte
        WHERE p <> 0
    )

    INSERT INTO @Arguments (Arg)

    SELECT n
    FROM cte

    RETURN

END
GO

DECLARE @T table (String varchar(255))
INSERT INTO @T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')
;

SELECT String
FROM
    @T
    CROSS JOIN GenerateArguments('905,467,4767')
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1

This looks ingenious! I’m going to have to give this a whirl to see if I can get it to do what I want (once I mull it over enough to figure out what exactly it does, and how). — Janus Bahs Jacquet, Oct 30 '17 at 17:46

score 0 · Answer 3 · answered Oct 30 '17 at 17:52

You can achieve this using the like function for the regex and row_number to determine the number of matches.

Here we declare the column values for testing:

DECLARE @tbl TABLE (
string NVARCHAR(MAX)
)

INSERT @tbl VALUES
(';905;1493;384;13387;29;933;467;28732;'),
(';905;138;3084;1387;290;9353;4767;2732;'),
(';9085;14493;3864;130387;289;933;4767;28732;')

Then we pass your search parameters into a table variable to be joined on:

DECLARE @search_tbl TABLE (
search_value INT
)

INSERT @search_tbl VALUES
(905),
(467),
(4767)

Finally we join the table with the column to search for onto the search table. We apply the row_number function to determine the number of times it matches. We select from this subquery where the row_number = 2 meaning that it joined at least twice.

SELECT
    string
FROM (
    SELECT
        tbl.string,
        ROW_NUMBER() OVER (PARTITION BY tbl.string ORDER BY tbl.string) AS rn
    FROM @tbl tbl
    JOIN @search_tbl search_tbl ON
        tbl.string LIKE '%;' + CAST(search_tbl.search_value AS NVARCHAR(MAX)) + ';%'
    ) tbl
WHERE rn = 2

score 0 · Answer 4 · answered Oct 30 '17 at 18:07

You could build a where clause like this :

WHERE
case when column like '%;905;%' then 1 else 0 end +
case when column like '%;467;%' then 1 else 0 end +
case when column like '%;4767;%' then 1 else 0 end >= 2

The advantage is that you do not need a helper table. I don't know how you build the query, but the following also works, and is useful if the numbers are in a tsql variable.

 case when column like ('%;' + @n + ';%')  then 1 else 0 end

SQL Server: Select rows with multiple occurrences of regex match in a column

4 Answers4

Linked