0

I have some data that I'm looking at that has text formatting stored within a NTEXT field. Happy enough with SQL Replace to remove data of a known length and format, however there are some fields with what looks like colour formatting and I'm trying to find a way to remove these.

An example of the data below, however (if possible) I would like to be able to remove whatever numbers follow the colours in the data but can't see how to introduce a wildcard into the replace statement. Something like '\red***\green\***\blue***' as per Excel, but this doesn't work in Sql Server.

declare @str varchar(1500) = '\red3\green73\blue125;Jimmy Jazz\red31\green73\blue125;'
select @str,
replace(@str,'\red31\green73\blue125;','')

Any pointers would be gratefully received, thanks in advance.

Dale K
  • 25,246
  • 15
  • 42
  • 71
JonTout
  • 618
  • 6
  • 14
  • 3
    `ntext` is obsolete and will be removed from SQL Server. I would suggest that you fix the database to use `nvarchar(max)`. – Gordon Linoff Nov 19 '20 at 18:27
  • True, however it's a third party school system and that change is scheduled for the next update. – JonTout Nov 19 '20 at 19:34

1 Answers1

0

Based on your sample data it would appear that you only need to remove the numbers in your string you can use patreplace8k or using patextract8K. Note the sample data and examples below:

-- Sample data
DECLARE @strings TABLE(stringId INT IDENTITY, string VARCHAR(100));
INSERT @strings VALUES('DeepPurple1978\yellow2\red009;pink\black3322'),
('red202\yellow5\red009;hotpink2'),('purple999\gray65\violet;blue\yellow381');

--==== Solution #1 Patreplace8k
SELECT
  s.stringId,
  pr.newString
FROM        @strings AS s
CROSS APPLY samd.patReplace8K(s.string,'[0-9]','') AS pr;

--==== Solution #2 PatExtract8k + STRING_AGG (SQL 2017+)
SELECT
  s.stringId,
  NewString = STRING_AGG(pe.Item,'') WITHIN GROUP (ORDER BY pe.ItemNumber)
FROM        @strings AS s
CROSS APPLY samd.patExtract8K(s.string,'[0-9]') AS pe
GROUP BY    s.stringId;

--==== Solution #3 PatExtract8k + XML Concatination (Pre SQL 2017\)
SELECT 
  s.stringId,
  NewString = 
(
  SELECT      pe.item+''
  FROM        @strings AS s2
  CROSS APPLY samd.patExtract8K(s2.string,'[0-9]') AS pe
  WHERE       s.stringId = s2.stringid
  ORDER BY    pe.itemNumber
  FOR XML PATH('')
) 
FROM        @strings AS s
GROUP BY    s.stringId;

Each of these solutions return:

stringId    NewString
----------- -------------------------------------
1           DeepPurple\yellow\red;pink\black
2           red\yellow\red;hotpink
3           purple\gray\violet;blue\yellow

The second and third leverage concatenation, the second compatible with SQL Server 2017+ the third works with earlier versions (you did not include what version you are on.)

To only strip the numbers that follow one or more pre-defined colors you could use patternsplitCM. Note the use of a table with a group of colors your are seeking; in the real world I'd use a real table.

-- Colors
DECLARE @colors TABLE(color VARCHAR(20) PRIMARY KEY);
INSERT  @colors VALUES('red'),('green'),('blue'),('yellow'),('purple'),('grey');

-- Sample data
DECLARE @strings TABLE(stringId INT IDENTITY, string VARCHAR(100));
INSERT @strings VALUES('Burger1978\yellow2\red009;pink\86thisfool'),
('red202\yellow5\red009;Freddy99'),('green999\grey65\violet;blue\yellow381');


SELECT 
  s.stringId, s.string, NewString = 
(
  SELECT
  (
    SELECT SUBSTRING(f.Item, IIF(f.M=0 AND EXISTS (SELECT c.Color FROM @colors AS c
                   WHERE c.Color = f.L),NULLIF(PATINDEX('%[^0-9]',f.item),0),1),8000)
    FROM
    (
      SELECT ps.ItemNumber, ps.Item, ps.[Matched],
             LAG(ps.Item,1,ps.Item) OVER (ORDER BY ps.ItemNumber)
      FROM   dbo.PatternSplitCM(s.string,'[^0-9\ ;]') AS ps
    ) AS f(ItemNumber,Item,M,L)
    ORDER BY f.ItemNumber
    FOR XML PATH(''), TYPE
  ).value('(text())[1]','varchar(8000)')
)
FROM @strings AS s;

Returns:

stringId    string                                        NewString
----------- --------------------------------------------- ----------------------------------------
1           Burger1978\yellow2\red009;pink\86thisfool     Burger1978\yellow\red;pink\86thisfool
2           red202\yellow5\red009;Freddy99                red\yellow\red;Freddy99
3           green999\grey65\violet;blue\yellow381         green\grey\violet;blue\yellow
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18
  • Thanks Alan, that looks useful. What I really want to do is replace the colours and numbers, but as the colour values range from 1 to 255, I need either 1,2 or 3 wildcards. I'll have a play with your solution though - might just have to remove the numbers, then the colours? – JonTout Nov 21 '20 at 18:45