0

I have got a table where one of the column has text mixed with HTML data. This was due to a bug (fixed) in a script, but the SQL data needs to be edited to reflect the changes. The column has a type nvarchar(max, null). All I need to do is find tags such as <a img="lkss">,<div attr=val> and replace them with empty string "". I looked into this, but the solution says how to replace the entire contents based on one pattern. My problem is that I need to replace contents partly, but preserve clean text (i.e. not HTML tags/attributes). Any suggestion/help is appreciated.

Test column data:

<div attr=val; color=somecolor>inside text<div some=val><a some=val>Inside anchor

Expected result:

inside textInside anchor
Community
  • 1
  • 1
ha9u63a7
  • 6,233
  • 16
  • 73
  • 108
  • I doubt you had a problem with the regex itself (`<[^<>]+>` or `<.*?>`). What tool/language/environment are you using to do this? Please share the code if you have it. – Wiktor Stribiżew Jun 29 '15 at 08:22
  • @stribizhev I don't have a problem with regex itself, but don't know what the correct regex will be to correct the data mentioned above. I am in SQL Server 2012 – ha9u63a7 Jun 29 '15 at 08:43
  • What about [Regex pattern inside SQL Replace function?](http://stackoverflow.com/a/23001916/3832970) post? Must be helpful. – Wiktor Stribiżew Jun 29 '15 at 09:09

1 Answers1

1

I use a CTE like this:

DECLARE @str nvarchar(max) = '<div attr=val; color=somecolor>inside text<div some=val><a some=val>Inside anchor';

WITH CTE(myStr) AS (
    SELECT @str
    UNION ALL
    SELECT REPLACE(mystr, SUBSTRING(myStr, CHARINDEX('<', myStr, 1),  CHARINDEX('>', myStr, 1) - CHARINDEX('<', myStr, 1) + 1), '')
    FROM CTE
    WHERE PATINDEX('%<%>%',myStr) > 0
)
SELECT myStr
FROM CTE
WHERE PATINDEX('%<%>%',myStr) = 0

I suggest you to use it in a SVF like this:

CREATE FUNCTION tagRemover 
(
    @str nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    DECLARE @ResultVar nvarchar(max);
    SELECT @ResultVar = @str;

    ;WITH CTE(myStr, id) AS (
        SELECT @str, 1
        UNION ALL
        SELECT REPLACE(mystr, SUBSTRING(myStr, CHARINDEX('<', myStr, 1),  CHARINDEX('>', myStr, 1) - CHARINDEX('<', myStr, 1) + 1), ''), id + 1
        FROM CTE
        WHERE PATINDEX('%<%>%',myStr) > 0
    )
    SELECT @ResultVar = myStr
    FROM CTE
    WHERE PATINDEX('%<%>%',myStr) = 0;

    RETURN @ResultVar;
END
GO
shA.t
  • 16,580
  • 5
  • 54
  • 111