1

I have a ntext (or nvarchar(max)) column which contains HTML text for example:

<P>Hello <IMG title="" border=0 alt="" src="files/1.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/folder1/2.JPG"></P>
<P>&nbsp;</P> World!
<P><IMG title="" border=0 alt="" src="files/folder2/files/3.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/4.JPG"></P>

Now I need to replace each path of the src from files/ to files/new/ (or to other path) but only those with path of files/.

a REPLACE won't work (I think) because it would change also files/folder1/ and files/folder2/.

In the above example, I need to only change the path for 1.JPG and 4.JPG.

How can I do it? (The question is specific to sql-server).

Note: The file names/content can vary. the above HTML is just an example.

zig
  • 4,524
  • 1
  • 24
  • 68
  • TSQL is not built for this, i.e. it's not possible. Ideally this would be done in your domain/logic layer of your application, not in the data layer. – rory.ap Nov 29 '16 at 13:54
  • It should be easy to change from files to files/new – Ram Grandhi Nov 29 '16 at 13:55
  • @RamGrandhi, read the question again. if I simply change `files` to `files/new` all 4 paths will be changed. – zig Nov 29 '16 at 14:04
  • Is this a one-off update, or does your table store template HTML which is constantly updated? – David Rushton Nov 29 '16 at 14:18
  • @destination-data, the content can change. is there any difference? – zig Nov 29 '16 at 14:38
  • Just thinking, if the HTML is only a template does it need to be valid? Can you use a format like: `src=#fileLocation001#/1.jpg`. This provides a handy replace target. If a one-off I would use another language or find/replace. If neither of these cases apply how often will you need to update? Perhaps C#, using regex, inside an SSIS package is the way to go? – David Rushton Nov 29 '16 at 15:30
  • Did you already try to loop through your nvarchar(max) value linewise, analyse the line e.g. by counting the / between src and JPG and writing a new string? Afterwards you can writhe this new string to the DB, replacing the original Value... – Tyron78 Nov 29 '16 at 17:02

3 Answers3

2

Following a quick approach in order to replace a certain pattern within a nvarchar(max) string. To be more precisely: the code takes a nvarchar(max) variable @x as source and writes the transformed string into a new variable @y. However, I wrote only the part of the string replacement - the "Update" still has to be coded.

DECLARE @x nvarchar(max) = '<P>Hello <IMG title="" border=0 alt="" src="files/1.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/folder1/2.JPG"></P>
<P>&nbsp;</P> World!
<P><IMG title="" border=0 alt="" src="files/folder2/files/3.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/4.JPG"></P>'

DECLARE @brPos int = (SELECT CHARINDEX('</P>', @x));
DECLARE @brPosPrev int = 0;

DECLARE @srcPos int;
DECLARE @SlashPos int;
DECLARE @JPGPos int;

DECLARE @y nvarchar(max) = '';

DECLARE @xPart nvarchar(max);

WHILE (@brPos != 0)
BEGIN
  SET @xPart = SUBSTRING(@x, CASE WHEN @brPosPrev = 0 THEN @brPosPrev ELSE @brPosPrev END, @brPos-CASE WHEN @brPosPrev = 0 THEN @brPosPrev ELSE @brPosPrev END+4)

  SET @srcPos = (SELECT CHARINDEX('src="files/', @xPart))+11;
  SET @JPGPos = (SELECT CHARINDEX('.JPG', @xPart));
  SET @SlashPos = (SELECT CHARINDEX('/', @xPart, @srcPos));

  IF (@JPGPos < @SlashPos OR @SlashPos = 0)
    SET @xPart = REPLACE(@xPart, 'src="files/', 'src="files/new/');

  SET @y = @y + @xPart;

  SET @brPosPrev = @brPos + 4;
  SET @brPos = (SELECT CHARINDEX('</P>', @x, @brPosPrev));

END

SELECT @y

The Patterns I used in the CHARINDEX calls can be moved to variables as well - so you are not bound to use .JPG or whatever.

Tyron78
  • 4,117
  • 2
  • 17
  • 32
  • `@y` output has 8 `IMG` tags instead of 4. did I missed anything? I'm also not sure why the `CHAR(10)` is needed. I will read the code more closely... – zig Nov 30 '16 at 11:38
  • The char(10) is for the linebreaks in the nvarchar(max). You are right - to many tags... guess I missed something - give me a minute. I will check. – Tyron78 Nov 30 '16 at 12:03
  • there is not necessary a line breaks in the content. it could be just `

    Hello `. I'm also trying to follow your idea but my start pos is `"files/` and end pos is `"`. I'm not so concerned about the `src=` part.

    – zig Nov 30 '16 at 12:24
  • I updated the code and use as split criteria... at least @y looks OK now – Tyron78 Nov 30 '16 at 12:49
  • Thanks, but the `` is now removed completely. I still think the split should be done with the closing `"` – zig Nov 30 '16 at 13:03
  • Oops, I missed to extend the length of the substring by 4 (in order to get as well). However, using the " as split criteria would lead to new lines after alt=" and title=" as well. That's why I used . Code has been updated. – Tyron78 Nov 30 '16 at 13:16
  • unfortunately `

    ` or `

    ` might not even be in the content as I already said. it could be just `Hello ` or `
    Hi`. note that the src might not be the last attribute. but the pattern `"files/[something]"` is always valid. I'm still trying to adjust your code without success.
    – zig Nov 30 '16 at 14:38
  • I'll accept your answer because your approach helped me, and I have a solution now. thanks :) – zig Nov 30 '16 at 16:51
1

My solution:

DECLARE @input nvarchar(max) = 
'<P>Hello <IMG title="" border=0 alt="" src="files/zzz/1.JPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 alt="" src="files/2.PNG"></P>
<P>&nbsp;</P> World!
<P><IMG title="" border=0 alt="" src="files/folder2/files/3.JEPG"></P>
<P>&nbsp;</P>
<P><IMG title="" border=0 src="files/4.PNG" alt=""></P>'


DECLARE @path_old nvarchar(max) = 'files/';
DECLARE @path_new nvarchar(max) = 'files/new/';

DECLARE @i int, @j int;
DECLARE @path_len int = LEN(@path_old);
DECLARE @input_len int = LEN(@input);
DECLARE @start_location int = 1;
DECLARE @output nvarchar(max) = '';
DECLARE @p nvarchar(max);

WHILE 1 = 1 
BEGIN
    SET @i = CHARINDEX('"' + @path_old, @input, @start_location) 
    IF @i <> 0 
    BEGIN
        SET @j = CHARINDEX('"', @input, @i + 1);
        SET @p = SUBSTRING(@input, @i + @path_len + 1, @j - @i - @path_len);
        IF CHARINDEX('/', @p) = 0  
            SET @output = @output + SUBSTRING(@input, @start_location - 1, @i + 2 - @start_location) + @path_new + @p 
        ELSE
            SET @output = @output + SUBSTRING(@input, @start_location - 1, @j + 2 - @start_location);           

        SET @start_location = @j + 2    
        IF @start_location >= @input_len BREAK;
    END
    ELSE
    BEGIN
        SET @output = @output + SUBSTRING(@input, @start_location - 1, @input_len - @start_location + 2);
        BREAK;  
    END
END

PRINT @output
zig
  • 4,524
  • 1
  • 24
  • 68
0

Method 1 - 1 or 2 known file names

If you only need to replace paths for 1.JPG and 4.JPG, then you can write an UPDATE statement:

UPDATE t
    SET html = REPLACE(REPLACE(html, 'src="files/4.JPG"', 'src="files/new/4.JPG"'), 'src="files/1.JPG"', 'src="files/new/1.JPG"')
FROM t
WHERE CHARINDEX('src="files/1.JPG"', html) + CHARINDEX('src="files/4.JPG"', html) > 0;

Or, if you'd rather keep them separate:

UPDATE t
    SET html = REPLACE(html, 'src="files/1.JPG"', 'src="files/new/1.JPG"')
FROM t
WHERE CHARINDEX('src="files/1.JPG"', html) > 0;


UPDATE t
    SET html = REPLACE(html, 'src="files/4.JPG"', 'src="files/new/4.JPG"')
FROM t
WHERE CHARINDEX('src="files/4.JPG"', html) > 0;

Method 2 - potentially large list of files, whose names can be determined from the filesystem

If you need to replace the paths for a large number of files, whose file names can be determined from the filesystem, try this instead:

  1. Enable xp_cmdshell if required (see Enable 'xp_cmdshell' SQL Server).

  2. Create a temporary table to store the file names:

    CREATE TABLE #Images(
        Image_Name VARCHAR(100) NULL
    )
    
  3. List all the files.

    INSERT INTO #Images
    exec xp_cmdshell 'dir D:\Data\*.jpg /b'
    
  4. Remove NULL lines and anything else that's not required.

    DELETE FROM #Images
    WHERE Image_Name IS NULL
    
  5. Generate SQL statements for each file name.

    SELECT '
        UPDATE t
            SET html = REPLACE(html, ''src="files/' + Image_Name + '"'', ''src="files/new/' + Image_Name + '"'')
        FROM t
        WHERE CHARINDEX(''src="files/' + Image_Name + '"'', html) > 0;
    '
    FROM #Images
    
  6. Now simply copy-paste the output from executing the above SELECT statement back and execute the UPDATE queries.

Community
  • 1
  • 1
Serge
  • 3,986
  • 2
  • 17
  • 37
  • 1
    Surely you need to understand that the content is dynamic. this HTML was just an example. 4.JPG can be foo.png or other... – zig Nov 29 '16 at 14:32
  • 1
    I know. If it could be done **easily** I would have done it myself :))) – zig Nov 29 '16 at 14:39