SQL Character Replace function extend

Question

I'am currently developing a program and i want to write a function which is accept a value in following format

"AAAA BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH"

I want to replace the spaces in above with "_" and need a output as showed following format (please note that this should happen only for string which is inside double quotes)

"AAAA_BBBB"  CCCC DDDD EEEE "FFFF_GGGG_HHHH"

So I got the correct answer to this in my previous question

DECLARE @In VARCHAR(50) = '"AAAA BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH"'

DECLARE @Quote SMALLINT = -1, @Index INT = 1, @Char CHAR(1)
WHILE @Index <= LEN(@In) BEGIN
    SET @Char = SUBSTRING(@In, @Index, 1)
    IF @Char = '"'
        SET @Quote = @Quote * -1
    IF @Char = ' ' AND @Quote > 0
        SET @In = STUFF(@In, @Index, 1, '_')
    SET @Index = @Index + 1
END

PRINT @In

and now I want to modify this with the following requirement

with the required format you can note that, underscore should be replace with the spaces which is only inside the enclosed double quotes, but now i get a wrong output if the user input was as following

"AAAA BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH cccc

it gives output as

"AAAA_BBBB"  CCCC DDDD EEEE "FFFF_GGGG_HHHH_cccc

but it must give output only for the first enclosed double quote string,because other one is not closed, expected output should be as follows

"AAAA_BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH cccc

This is silly, what you really should do, is storing the value with the format AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHH and format it elsewhere. Maybe a computed field — t-clausen.dk, Oct 23 '14 at 07:27

score 2 · Accepted Answer · answered Oct 23 '14 at 07:47

Please refer this code. I think this code will give some hint. This code is not the tested code. I am posting answer Because at least you get some hint to find your solution.

DECLARE @In VARCHAR(50) = '"AAAA BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH IIIII'

DECLARE @Quote SMALLINT = -1, @Index INT = 1, @Char CHAR(1) ,@New_Char CHAR(100)
WHILE @Index <= LEN(@In) BEGIN
    SET @Char = SUBSTRING(@In, @Index, 1)
    IF @Char = '"'
        SET @Quote = @Quote * -1
        SET @New_Char = SUBSTRING(@In, @Index + 1,213)
    IF @Char = ' ' AND @Quote > 0 AND @New_Char like '%"%'
        SET @In = STUFF(@In, @Index, 1, '_')
    SET @Index = @Index + 1
END

PRINT @In

score 2 · Answer 2 · answered Oct 23 '14 at 11:50

Here are the modifications I would make

Once all replacements have been made, exit existing the loop (added WHILE loop condition and how logic for the quote recognition behaves to facilitate the exit)
Remove dependencies on any length (CHAR(100), SUBSTRING(...,213))

Code

DECLARE @In VARCHAR(50) = '"AAAA BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH IIIII"'

DECLARE @Quote SMALLINT = 0, @Index INT = 1, @Char CHAR(1)
WHILE @Quote < 2 AND @Index <= LEN(@In) BEGIN
    SET @Char = SUBSTRING(@In, @Index, 1)
    IF @Char = '"'
        SET @Quote = @Quote + 1
    IF @Char = ' ' AND @Quote = 1
        SET @In = STUFF(@In, @Index, 1, '_')
    SET @Index = @Index + 1
END

PRINT @In

Output

"AAAA_BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH IIIII"

score 1 · Answer 3 · edited May 23 '17 at 11:56

I'd approach this differently to your last answer, and avoid a loop. The first step is to generate a list of 50 sequential numbers (this number should match the length of your varchar). This will allow you to split the varchar into rows which are much easier to deal with:

DECLARE @s VARCHAR(50) = '"AAAA_BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH cccc';

WITH Numbers (Number) AS -- 50 ROWS
(   SELECT  TOP (LEN(@s)) ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) N1 (N)
            CROSS JOIN (VALUES (1), (1), (1), (1), (1)) N2 (N)
)
SELECT  Number,
        Letter = SUBSTRING(@s, Number, 1)
FROM    Numbers;

This gives something like:

Number | Letter
-------+--------
   1   |   "
   2   |   A
   3   |   A

In a similar fashion you can use this to get 'Groups' of quotation marks:

DECLARE @s VARCHAR(50) = '"AAAA_BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH cccc';

WITH Numbers (Number) AS -- 50 ROWS
(   SELECT  TOP (LEN(@s)) ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) N1 (N)
            CROSS JOIN (VALUES (1), (1), (1), (1), (1)) N2 (N)
)
SELECT  Number,
        GroupID = CEILING(ROW_NUMBER() OVER(ORDER BY Number) / 2.0)
FROM    Numbers 
WHERE   SUBSTRING(@s, Number, 1) = '"'

Which gives:

Number | GroupID
-------+--------
   1   |    1
  11   |    1
  29   |    2

You can the group by GroupID to get the start and end position of your groups of quotation marks:

DECLARE @s VARCHAR(50) = '"AAAA_BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH cccc';

WITH Numbers (Number) AS -- 50 ROWS
(   SELECT  TOP (LEN(@s)) ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) N1 (N)
            CROSS JOIN (VALUES (1), (1), (1), (1), (1)) N2 (N)
), Grouping AS
(   SELECT  Number,
            GroupID = CEILING(ROW_NUMBER() OVER(ORDER BY Number) / 2.0)
    FROM    Numbers 
    WHERE   SUBSTRING(@s, Number, 1) = '"'
)
SELECT  StartPosition = MIN(Number),
        EndPosition = MAX(Number)
FROM    Grouping
GROUP BY GroupID
HAVING COUNT(*) = 2 -- ONLY INCLUDE WHERE THERE IS A START AND AN END

Now you can take this back to your original split, and replace any spaces in this range with an underscore:

DECLARE @s VARCHAR(50) = '"AAAA_BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH cccc';

WITH Numbers (Number) AS -- 50 ROWS
(   SELECT  TOP (LEN(@s)) ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) N1 (N)
            CROSS JOIN (VALUES (1), (1), (1), (1), (1)) N2 (N)
), Letters AS
(   SELECT  Number,
            Letter = SUBSTRING(@s, Number, 1)
    FROM    Numbers
), Grouping AS
(   SELECT  Number,
            GroupID = CEILING(ROW_NUMBER() OVER(ORDER BY Number) / 2.0)
    FROM    Letters 
    WHERE   Letter = '"'
), Groups AS
(   SELECT  StartPosition = MIN(Number),
            EndPosition = MAX(Number)
    FROM    Grouping
    GROUP BY GroupID
    HAVING COUNT(*) = 2
)
SELECT  l.Number,
        Letter = CASE WHEN g.StartPosition IS NOT NULL AND l.Letter = ' ' THEN '_' ELSE l.Letter END
FROM    Letters AS l
        LEFT JOIN Groups AS g
            ON g.StartPosition <= l.Number
            AND g.EndPosition >= l.Number;

Which gives:

Number | Letter
-------+--------
   1   |   "
   2   |   A
   3   |   A
   4   |   A
   5   |   A
   6   |   _
   7   |   B
   8   |   B
   9   |   B
  10   |   B
  11   |   "
  12   |

Then you can rebuild your original string using FOR XML concatenation:

DECLARE @s VARCHAR(50) = '"AAAA_BBBB"  CCCC DDDD EEEE "FFFF GGGG HHHH cccc';

WITH Numbers (Number) AS -- 50 ROWS
(   SELECT  TOP (LEN(@s)) ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) N1 (N)
            CROSS JOIN (VALUES (1), (1), (1), (1), (1)) N2 (N)
), Letters AS
(   SELECT  Number,
            Letter = SUBSTRING(@s, Number, 1)
    FROM    Numbers
), Grouping AS
(   SELECT  Number,
            GroupID = CEILING(ROW_NUMBER() OVER(ORDER BY Number) / 2.0)
    FROM    Letters 
    WHERE   Letter = '"'
), Groups AS
(   SELECT  StartPosition = MIN(Number),
            EndPosition = MAX(Number)
    FROM    Grouping
    GROUP BY GroupID
    HAVING COUNT(*) = 2
)
SELECT  String = (  SELECT  CASE WHEN g.StartPosition IS NOT NULL AND l.Letter = ' ' THEN '_' ELSE l.Letter END
                    FROM    Letters AS l
                            LEFT JOIN Groups AS g
                                ON g.StartPosition <= l.Number
                                AND g.EndPosition >= l.Number
                    ORDER BY l.Number
                    FOR XML PATH(''), TYPE
                ).value('.', 'VARCHAR(50)');

Finally, I would enclose the logic in an inline table valued function:

CREATE FUNCTION dbo.YourFunctionName (@s VARCHAR(50))
RETURNS TABLE
AS
RETURN
(   WITH Numbers (Number) AS -- 50 ROWS
    (   SELECT  TOP (LEN(@s)) ROW_NUMBER() OVER(ORDER BY N1.N)
        FROM    (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) N1 (N)
                CROSS JOIN (VALUES (1), (1), (1), (1), (1)) N2 (N)
    ), Letters AS
    (   SELECT  Number,
                Letter = SUBSTRING(@s, Number, 1)
        FROM    Numbers
    ), Grouping AS
    (   SELECT  Number,
                GroupID = CEILING(ROW_NUMBER() OVER(ORDER BY Number) / 2.0)
        FROM    Letters 
        WHERE   Letter = '"'
    ), Groups AS
    (   SELECT  StartPosition = MIN(Number),
                EndPosition = MAX(Number),
                Letter = ' '
        FROM    Grouping
        GROUP BY GroupID
        HAVING COUNT(*) = 2
    )
    SELECT  String = (  SELECT  CASE WHEN l.Letter = g.Letter THEN '_' ELSE l.Letter END
                        FROM    Letters AS l
                                LEFT JOIN Groups AS g
                                    ON g.StartPosition <= l.Number
                                    AND g.EndPosition >= l.Number
                        ORDER BY l.Number
                        FOR XML PATH(''), TYPE
                    ).value('.', 'VARCHAR(50)')
);

Which can the be called as:

SELECT  t.s,
        Replaced = (SELECT String FROM dbo.YourFunctionName (t.s))
FROM    (VALUES 
            ('"AAAA_BBBB" CCCC DDDD EEEE "FFFF GGGG HHHH cccc'),
            ('"AAAA_BBBB" CCCC DDDD EEEE "FFFF GGGG HHHH cccc"'),
            ('"AAAA_BBBB" CCCC "DDDD EEEE" "FFFF GGGG HHHH cccc'),
            ('"AAAA_BBBB" CCCC "DDDD EEEE" "FFFF GGGG HHHH cccc"')
        ) AS t (s);

This may seem significantly more complicated, but hopefully I have explained the logic properly, and the big advantage with using an inline table valued function is that the definition is expanded out into the main query, unlike scalar udfs which have a cursor like execution.

Interesting approach. To add flexibility, you could even use recursive CTE to generate the numbers so if the LEN(@s) was 100 you don't have any extra work: `;WITH Numbers (Number) AS (SELECT 1 UNION ALL SELECT Number + 1 FROM Numbers WHERE Number <= LEN(@s)) SELECT * FROM Numbers OPTION (MAXRECURSION 0)` — Jason W, Oct 23 '14 at 11:59
Using a recursive CTE is an awful way to generate a sequence of numbers, if more rows were required I would just add more cross joins. For further information read the 3 part series [Generate a set or sequence without loops – part 1](http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1) — GarethD, Oct 23 '14 at 12:07
Awful sounds a bit extreme. It's not the fastest, but it's easy to follow and doesn't rely on manually controlled limits even with stacked CTE or cartesian products on system objects. — Jason W, Oct 23 '14 at 12:17
I can only assume you did not make it to the 2nd page of that article then where the recursive CTE was [25 x slower than any other method](http://cdn.sqlperformance.com/wp-content/uploads/2013/01/n_1MM_res1.png). I put the method in the same category as using a cursor, in some situations the cursor may be small enough to not impact performance, but this does not mean it is good practice to use it, therefore I never advocate the use of it even for small sets, because to do so opens the gate to people using it for large lists where the performance will be an issue. — GarethD, Oct 23 '14 at 12:25
Valid points, and I've read that entire article before. For known small sets like this, I usually prefer readability, but agree about the gateway where others can try to apply this bigger sets getting them into trouble. I'll concede :) — Jason W, Oct 23 '14 at 12:34

SQL Character Replace function extend

3 Answers3