String Manipulation in TSQL

Question

I have a data in a table as shown in the first column (Input) and would like to update as shown in column (Results).

Please note there is a space on both sides of the operator (+, -, /, *).

Only these four arithmetic operators would be used. Except, operators and numerals, other words should be enclosed by [ ].

Please do help me to achieve this either using query or using user defined function or stored procedure.

Input and Output Expected is given below

I have tried spiting one of the input, for example, 'Pre Results + 10 - New Results' into Pre, Results, +, 10, -, New, Results. But I am not able to combine it back with additional [ and ] for the descriptions with braces like New Results should be [New Results] — Naveen, Oct 23 '17 at 10:25

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

This can be achieved in a single, set based statement without user defined functions or `while` loops, utilising four derived tables in a CTE, `for xml` and windowed functions.

The process is as follows:

Create a table with 10 rows in.
Self join that table to itself several times to generate a large number of rows (10 * 10 * 10 etc) and use row_number to create an incremental tally table.
Use the tally table to split the input string in a set based manner.
Use lag and lead to work out if you need to add a [ or ] character.
Concatenate the string back together.

declare @t table(id int,input varchar(155));
insert into @t values (1,'Total Deduction * 10'),(2,'Premium + 0.01'),(3,'100 / Final Results'),(4,'Pre Results + 10 - New Result');

with t(t) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
    ,n(n) as (select top(select max(len(input)) from @t) row_number() over (order by (select null)) from t t1, t t2, t t3, t t4, t t5)
    ,l as (select i.id
                 ,i.input
                 ,n.n
                 ,substring(i.input,n.n,1) as l
           from @t as i
             join n
               on len(i.input) >= n.n
          )
    ,w as (select *
                 ,case when n = 1 and isnumeric(l) = 0
                         then '[' + l
                       when l = ' '
                           and lag(l,1,'') over (partition by id order by n) in('+','-','*','/')
                           and isnumeric(lead(l,1,'') over (partition by id order by n)) = 0
                         then ' ['
                       when l = ' '
                           and isnumeric(lag(l,1,'') over (partition by id order by n)) = 0
                           and lead(l,1,'') over (partition by id order by n) in('+','-','*','/')
                         then '] '
                       when n = len(input) and isnumeric(l) = 0 then l + ']'
                       else l
                     end as w
           from l
          )
select id
      ,input
      ,(select w
        from w as w2
        where w.id = w2.id
        for xml path(''), type
       ).value('.','nvarchar(max)') as Results
from w
group by id
        ,input
order by id
        ,input;

Output:

+----+-------------------------------+-----------------------------------+
| id |             input             |              Results              |
+----+-------------------------------+-----------------------------------+
|  1 | Total Deduction * 10          | [Total Deduction] * 10            |
|  2 | Premium + 0.01                | [Premium] + 0.01                  |
|  3 | 100 / Final Results           | 100 / [Final Results]             |
|  4 | Pre Results + 10 - New Result | [Pre Results] + 10 - [New Result] |
+----+-------------------------------+-----------------------------------+

Just to make a note of it, it is SQL Server 2012 + only. – Amit Kumar Singh Oct 24 '17 at 04:31 — Amit Kumar Singh, Oct 24 '17 at 04:31

Amit Kumar Singh · Answer 2 · 2017-10-23T11:46:46.237

Create function PatternStringLoop from SQL Server Central in your database.

I am pasting the same here for easy access.

-- PatternSplitLoop will split a string based on a pattern of the form 
-- supported by LIKE and PATINDEX 
-- 
-- Created by: Dwain Camps 11-Oct-2012 
CREATE FUNCTION [dbo].[PatternSplitLoop]
 (  @String    VARCHAR(400)
   ,@Pattern   VARCHAR(500)
  ) RETURNS 
  @Results TABLE ( ItemNumber  INT
                  ,Item        VARCHAR(400)
                  ,[Matched]   INT     )
   WITH SCHEMABINDING 
AS 
BEGIN;

-- DECLARE a couple of variables we'll need in our loop     
DECLARE 
  @ItemNumber  INT = 0 
, @Remaining   VARCHAR(400) = ISNULL(@String, '')         
-- Create the "not pattern" 
, @NotPattern  VARCHAR(500) = REPLACE(REPLACE(@Pattern, '[', '[^'), '^^', '')
, @Matched     INT

IF @String IS NULL OR @Pattern IS NULL 
  INSERT INTO @Results SELECT NULL, NULL, NULL

WHILE DATALENGTH(@Remaining) > 0     
 BEGIN
   SELECT @ItemNumber = @ItemNumber + 1
   -- The item returned from the cascaded CROSS APPLY b below               
    ,@String = CASE
            -- When a+b = 1, then either a=1 and b=0 
            --(the pattern was found but not pattern                     
            -- was not found) or a=0 and b=1 
            -- (the not pattern was found but pattern was           
            -- not found).
            -- This means that no meaninful patterns 
            --are found in what remains so we’re done.   
            WHEN a+b = 1 THEN @Remaining   
            -- This case returns the chunk up to the 
            -- start of the next pattern/not pattern  
               WHEN (a=1 AND b>0) OR (b=1 AND a>0)                                  
              THEN SUBSTRING(@Remaining, 1, CASE a 
                                  WHEN 1 THEN b 
                                  ELSE a 
                                 END - 1)
               ELSE @Remaining                            
                END 
           ,@Matched=CASE a WHEN 1 THEN 1 ELSE 0 END                    
            FROM (
       -- Find the next occurrence of the Pattern and the NotPattern
        SELECT PATINDEX(@Pattern, @Remaining)
             , PATINDEX(@NotPattern, @Remaining)
           ) a(a, b)

       -- Now that we have our ItemNumber and Item (in @String) 
        -- INSERT them into our results
       INSERT INTO @Results SELECT @ItemNumber, @String, @Matched

        -- Find the remaining characters in the string         
        SELECT @Remaining = CASE
                 WHEN DATALENGTH(@Remaining) = DATALENGTH(@String) THEN ''
                 ELSE SUBSTRING(@Remaining, DATALENGTH(@String)+1, 
              DATALENGTH(@Remaining)) 
         END
 END

RETURN
END

Next step, use CROSS APPLY over every VALUE field in Your database, and INSERT the result in a table variable.

The pattern %[a-zA-Z ]% looks for complete text only string in the field with a space.

DECLARE @data TABLE 
(
 ItemNumber INT, 
 Value VARCHAR(500),
 Formatted VARCHAR(100), 
 Combined VARCHAR(500)
)

INSERT INTO @data
SELECT ItemNumber, Input, CASE WHEN Matched = 1 
AND LTRIM(RTRIM(Item)) != '' THEN '[' + LTRIM(RTRIM(Item)) + ']' 
ELSE Item END as Arranged, '' 
FROM [YOUR_TABLE]
  CROSS APPLY PatternSplitLoop(Input, '%[a-zA-Z ]%')

Once you have this, you can get distinct values in input in a temporary table variable with a IDENTITY column, since we need to combine the results for distinct input values.

 DECLARE @tmp TABLE(
    ID INT IDENTITY(1,1),
    DistinctValue VARCHAR(500)
  )

  INSERT INTO @tmp(DistinctValue)
  SELECT Distinct Value FROM @data

Loop through the temporary table variable with distinct values, and update the @data table variable with the resultant value.

DECLARE @minID INT = 1
  DECLARE @maxID INT = 0
  SELECT @maxID = COUNT(ID) FROM @tmp 
  DECLARE @currValue VARCHAR(500) = ''

  WHILE (@minID <= @maxID)
  BEGIN 

    SELECT @currValue = DistinctValue FROM @tmp WHERE ID = @minID

    Update @data Set Combined = substring(
        (
            Select ' '+Formatted  AS [text()] 
            From @data a WHERE Value = @currValue
            ORDER BY a.ItemNumber
            For XML PATH ('')
        ), 2, 1000) FROM @data a INNER JOIN @tmp b 
       ON a.Value = b.DistinctValue AND a.Value = @currValue
        WHERE ItemNumber = 1 

   SET @minID = @minID + 1
   SET @currValue = ''
  END

Select the resulting output

  SELECT Value AS Input, REPLACE(Combined, '&#x20;', ' ') As Result 
  FROM @data WHERE ItemNumber = 1

If you read that article you'll see that PatternSplitLoop is one of the the examples of how NOT to create a pattern-based splitter. PatternSplitCM (also included in the article) is the way to go. Nonetheless, you were going in the right direction with your solution. Note what I put together as it uses PatternSplitCM. — Alan Burstein, Oct 23 '17 at 23:39
@AlanBurstein I would disagree a bit here. It gives precise control over every stage of query processing. Also, it is not mentioned that it is NOT preferred, but performance wise comes second. So, user here would know at every stage how to proceed at each stage of query. From completion of task perspective, the query you have put forward actually does it in a single SELECT which is better. — Amit Kumar Singh, Oct 24 '17 at 04:28

score 0 · Answer 3 · answered Oct 23 '17 at 12:04

First just strip the alphabetical characters from string like here and then split the row that contains multiple values into rows like here. After that just update your table

create table #tbl1 (
    id int
    ,input varchar(155)
)

insert into #tbl1 values (1,'Total Deduction * 10')
insert into #tbl1 values (2,'Premium + 0.01')
insert into #tbl1 values (3,'100 / Final Results')
insert into #tbl1 values (4,'Pre Results + 10 - New Result')

;with cte as (
    SELECT
        LTRIM(RTRIM((SELECT CAST(CAST((
            SELECT SUBSTRING(input, Number, 1)
            FROM master..spt_values
            WHERE Type='p' AND Number <= LEN(input) AND
                SUBSTRING(input, Number, 1) LIKE '%[a-z ]%' FOR XML Path(''))
        AS xml) AS varchar(MAX))))) as new_input
    FROM #tbl1
) select
    LTRIM(RTRIM(m.n.value('.[1]','varchar(8000)'))) AS new_input
into #tbl2
FROM
(
    SELECT CAST('<XMLRoot><RowData>' + REPLACE(new_input,'  ','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
    FROM cte
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

declare @new_input as varchar(15)
while exists (select new_input from #tbl2)
begin
    select top 1 @new_input = new_input from #tbl2

    update #tbl1 set input = REPLACE(input, @new_input, '[' + @new_input + ']')

    delete from #tbl2 where new_input = @new_input
end

select * from #tbl1

drop table #tbl2
drop table #tbl1

score 0 · Answer 4 · answered Oct 23 '17 at 14:50

What about this:

declare @t table(id int,input varchar(155));

insert into @t values (1,'Total Deduction * 10'),(2,'Premium + 0.01'),(3,'100 / Final Results'),(4,'Pre Results + 10 - New Result');

SELECT *
      ,[dbo].[fn_Utils_RegexReplace] (input, '[a-zA-z]{1}[a-zA-z\s]+[a-zA-z]{1}', '[$&]')
FROM @t;

How to get such nice functions? You need to read a little, but generally its easy - SQL CLR Functions. Once having such functions you can solve various of tasks easier.

score 0 · Answer 5 · answered Oct 23 '17 at 15:51

As @iamdave eluded to, Set-based is the way to go here; No loops necessary.

If you don't feel like messing with a Regex CLR you could grab a copy of PatternSplitCM and do this:

-- sample data:
declare @t table(id int,input varchar(155));
insert into @t values (1,'Total Deduction * 10'),(2,'Premium + 0.01'),
                      (3,'100 / Final Results'),(4,'Pre Results + 10 - New Result');    
-- solution:
select original  = input, 
       newString = 
  (
    select case when [matched]=1 and v.item <> ' ' then ' ['+v.item+'] ' else ps.item end
    from dbo.patternSplitCM(t1.input, '[a-zA-Z ]') ps
    cross apply (values (rtrim(ltrim(ps.Item)))) v(item)
    order by ItemNumber
    for xml path(''), type
  ).value('(text())[1]', 'varchar(8000)') 
from @t t1;

Results

original                        newString
------------------------------- ---------------------------------
Total Deduction * 10            [Total Deduction] * 10
Premium + 0.01                  [Premium] + 0.01
100 / Final Results             100 / [Final Results] 
Pre Results + 10 - New Result   [Pre Results] + 10 - [New Result]

String Manipulation in TSQL

5 Answers5

This can be achieved in a single, set based statement without user defined functions or while loops, utilising four derived tables in a CTE, for xml and windowed functions.

This can be achieved in a single, set based statement without user defined functions or `while` loops, utilising four derived tables in a CTE, `for xml` and windowed functions.