0

I have the following test string that shows up in a column of my table.

<B99_9>TEST</B99_9><LastDay>TEST</LastDay>

I want to remove the values between specific tags. For example:

REPLACE( <B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9> )

This should give the following output:

<B99_9></B99_9><LastDay>TEST</LastDay>

This kind of does what I need it to do, but what can I do for situations where I don't know what the value is going to be between the tags? Is there a way to implement some Regex in the Replace function's search so I can search for maybe just the beginning/end tags regardless of the value between them?

I understand this could probably work in a CLR function but I'm really curious to see if I can get this work just by using t-sql. Thanks in advance!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
James Vaughn III
  • 145
  • 1
  • 3
  • 11

2 Answers2

3

There is no out of the box support for regex in SQL Server yet though it is easy to deploy CLR code that does this.

But you can use standard string functions for this case

DECLARE @T TABLE
  (
     X VARCHAR(MAX)
  );

INSERT INTO @T
VALUES      ('<B99_9>TEST</B99_9><LastDay>TEST</LastDay>'),
            ('Some prefix <B99_9>TEST</B99_9><LastDay>TEST</LastDay>')

UPDATE [@T]
SET    X = CASE
             WHEN 0 IN ( StartIndex, EndIndex )
                   OR EndIndex <= StartIndex THEN X
             ELSE STUFF(X, AdjStartIndex, EndIndex - AdjStartIndex, '')
           END
FROM   @T [@T]
       CROSS APPLY (VALUES ('<B99_9>','</B99_9>')) V(StartString, EndString)
       CROSS APPLY (VALUES (PATINDEX('%' + StartString + '%', X),PATINDEX('%' + EndString + '%', X))) V2(StartIndex, EndIndex)
       CROSS APPLY (VALUES (StartIndex + LEN(StartString))) V3(AdjStartIndex)

SELECT *
FROM   @T 

Though, as the string is XML, storing it in an XML column would allow you to update it using inbuilt, semantically aware, XML methods rather than string parsing.

DECLARE @T TABLE(X XML);

INSERT INTO @T 
VALUES ('<B99_9>TEST</B99_9><LastDay>TEST</LastDay>'),  
       ('Some prefix <B99_9>TEST</B99_9><LastDay>TEST</LastDay>')


UPDATE @T
SET  X.modify('
  replace value of (/B99_9/text())[1]
  with "" ')  

SELECT *
FROM @T
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Hi Martin, this is awesome, thank you! Is there a way for the above update statement to look for all occurrences of B99_9? I'm able to change the first tag but if it shows up again, its value is still unchanged. Thanks! – James Vaughn III Jul 06 '15 at 15:43
3

If you wont or maybe can't use a CLR due to security reasons, there is a simple way using a CTE inside standard t-sql.

Here is a complete example inclusive demo structure. You can run it on a whole table.

CREATE TABLE #dummyData(id int identity(1,1), teststring nvarchar(255))

INSERT INTO #dummyData(teststring)
VALUES(N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>')

DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

;WITH cte AS(
    SELECT id, STUFF(
                teststring,
                PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
                (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
                N''
            ) as teststring, 1 as iteration
    FROM #dummyData
    -- iterate until everything is replaced
    UNION ALL
    SELECT id, STUFF(
                teststring,
                PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
                (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
                N''
            ) as teststring, iteration+1 as iteration
    FROM cte
    WHERE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring) > 0
)
SELECT c.id, c.teststring 
FROM cte as c
-- Join to get only the latest iteration
INNER JOIN (
            SELECT id, MAX(iteration) as maxIteration
            FROM cte 
            GROUP BY id
        ) as onlyMax
    ON c.id = onlyMax.id
    AND c.iteration = onlyMax.maxIteration

-- Cleanup
DROP TABLE #dummyData

If you want to use the result of the CTE in an update. You can just replace the part after the CTE-definition with the following code:

UPDATE dd
SET teststring = c.teststring
FROM #dummyData as dd -- rejoin the base table for later update usage
INNER JOIN cte as c
        ON dd.id = c.id
-- Join to get only the latest iteration
INNER JOIN (
            SELECT id, MAX(iteration) as maxIteration
            FROM cte 
            GROUP BY id
        ) as onlyMax
    ON c.id = onlyMax.id
    AND c.iteration = onlyMax.maxIteration

If you don't want to run it on a complete table set, you can run the following code for a single variable:

DECLARE @string nvarchar(max) = N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>'
DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

WHILE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string) > 0 BEGIN
    SELECT @string = STUFF(
                    @string,
                    PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag),
                    (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',@string)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag)),
                    N''
                )
END

SELECT @string
Ionic
  • 3,884
  • 1
  • 12
  • 33
  • Thank you so much Ionic, I'm still trying to figure out how all this magic works but your solution does exactly what I need for all occurrences of the tag! If I have other tags that I need to be scrubbed with a similar format (i.e. E11_01 and A11_02) I would only need to adjust the @starttag variable right? – James Vaughn III Jul 06 '15 at 15:54
  • 1
    No not just the starttag. You need to adjust both variables `@startag` and `@endtag`. :-) Both are needed to fix it. You can also work with an pattern.I designed the code to be flexible. You can try this too: `@starttag nvarchar(10) = N'<[A-Z][0-9][0-9][_][0-9][0-9]>'` The same for `@endtag` too (with the slash). This will match all tags with the same pattern like E11_01 and A11_02 etc. But not B99_9 (as it has just one number at the end, not two). But it's a benefit at all to use patterns like described above. :-) – Ionic Jul 06 '15 at 16:10
  • Perfect! Just one more question, since this is my first time learning about CTE's I'm a little confused since most CTE examples I've seen are used in conjunction with a SELECT statement. Can this CTE query be used for updating a column within the #dummyData table? I would like to update the column with the last iteration's output rather than just selecting it. Thanks Ionic! – James Vaughn III Jul 06 '15 at 18:10
  • 1
    Yes there is no problem with that. You can use a `CTE` like every other set. This means you can use it in an `UPDATE` or an `INSERT` too. You can for example join the base table (`#dummydata') after the `CTE` definition with the `CTE` itself and use the output of the `CTE` as the `SET` criteria of your `UPDATE`. You **cannot** update the `CTE` itself, but use it's outcome for the the `UPDATE`. I'll update the answer above for an example. – Ionic Jul 06 '15 at 20:02
  • I can't thank you enough Ionic, this was all extremely helpful! – James Vaughn III Jul 06 '15 at 22:30
  • Nice to hear that this helped you. :-) – Ionic Jul 07 '15 at 04:07