4

We have the below query which was written to replace any & in a memo field with the HTML equivalent (&). When written, we did not consider that there are potentially other HTML tags in the field which also begin with “&” (i.e. — " etc). Since we must ensure that all ampersands are the HTML equivalent when used alone and not part of another tag we must skip those which at part of another tag. That said, the shortest HTML tag which can start with & seems to be 3 characters, and the longest seems to be six characters, so & _ _ _ ; to & _ _ _ _ _ _ ; in length ... are there any ideas on updating the where clause so that it does not update any & which are proceeded with a ";" in the next 4-7 characters after a &? Thank you.

 UPDATE STOCKMEM
 SET INETFDESC = CAST(
                       REPLACE(
                               REPLACE(
                                       CAST(INETFDESC as NVarchar(MAX))
                               ,'&','&')
                       , '&', ,'&')AS NText)
  WHERE INETFDESC LIKE '%&[^amp;]%' 
Justin
  • 229
  • 1
  • 3
  • 11
  • Have you considered using RegEx? http://stackoverflow.com/questions/8928378/using-regex-in-sql-server – Svek Feb 22 '17 at 05:24
  • Not sure if that is an option, but if I'd have that problem I'd use SSIS with a C# script task – RoundFour Feb 22 '17 at 06:51
  • These things are usually handled better using a programming language then in sql. – ATC Feb 22 '17 at 07:09

3 Answers3

0

Probably not the best possible way to handle this problem, but...

You can use underscore _ as an indicator that there should be some character in that place, which effectively makes it a character-counter in a situation like this. Just a quick example:

SELECT REPLACE('This is &[^amp;] just a test.','&[^amp;]','&')
WHERE 'This is &[^amp;] just a test.' LIKE '%&___;%'

This will not return a value, because the string in the WHERE clause does not include & followed by three characters _ _ _ followed by a semi-colon.

SELECT REPLACE('This is &[^amp;] just a test.','&[^amp;]','&')
WHERE 'This is &[^amp;] just a test.' LIKE '%&_____;%' 

This will return a value, because the LIKE condition is met by the string in the WHERE clause : &_ _ _ _ _; (spacing added for clarity)

Maybe you could use that to your advantage?

3BK
  • 1,338
  • 1
  • 8
  • 11
0

This is not pretty, but it does the job, I think. The idea is to find all ampersands that are not part of an entity. Here, entities are assumed to be ampersand, one letter, some more characters then semi column.

set nocount on
--drop table #HtmlTest
select CONVERT( nvarchar(255) , 
  N'The & & z; HTML & replacement < > é ε test & a; ' )   as test
  into #HtmlTest

select test from #HtmlTest

declare @posStart int, @posStart1 int, @posStart2 int, @posEnd int, @isEntity bit
set @posStart = 1 

while (@posStart != 0)
  begin
    select @posStart1 = charindex('&', test, @posStart + 1)  from #HtmlTest
    select @posStart2 
        = patindex('%&[a-z]%;%', substring(test, @posStart + 1, 99999)) 
        + @posStart    from #HtmlTest
    set    @isEntity  = IIF(@posStart1 = @posStart2, 1, 0)
    select @posEnd    = charindex(';', test, @posStart1 + 1) from #HtmlTest

    set @posStart = @posStart1

    if (@isEntity = 0 and @posStart1 > 0) 
      begin
        update #HtmlTest 
          set test = SUBSTRING(test, 1, @posStart1 - 1) + '&' 
                   + SUBSTRING(test, @posStart1 + 1, 999999) 
        select test from #HtmlTest
        set @posStart += 4
      end
  end
select test from #HtmlTest
set nocount off
0

I think this will do the job:

 UPDATE STOCKMEM
 SET INETFDESC = CAST(
                 REPLACE(
                     CAST(INETFDESC as NVarchar(MAX)), '& ', '&amp ')
                 ) AS NText
              )

If & is part of any tag, it won't be followed by space, so replace every & followed by space to &amp followed by space.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69