2

I am working on writing a UDF that searches a string of text for different combinations of white space delimiters (i.e. Carriage Returns, Line Feeds, Spaces, etc.) and replaces them with a different non-white space delimiter.

I inserted all the possible combinations of delimiters in a table variable along with the new replacement delimiter; however, when I used the replace() function it wasn't matching my delimiters as I expected. It turns out if one row on my delimiters table had a delimiter that was 3 characters long, and other rows only had delimiters that were 2 characters long, it was inserting a trailing space on the rows with only 2 character delimiters.

Also I noticed if I added just one TRIM() around any single or combination of CHAR()s on just one row during the insert, it would remove the trailing space for all rows.

I am just trying to understand why the trailing spaces are being added during the insert, as well as why using the TRIM() on just one row during the insert is affecting all of the rows with trailing whitespace.

Using MSSQL Server 14.0.3445 RTM with SQL_Latin1_General_CP1_CI_AS Collation

Here is an example with comments that explains my issue:

DECLARE @delimiters TABLE (Id INT, delim VARCHAR(10), newDelim CHAR)

INSERT INTO @delimiters
VALUES
(1, CHAR(10) + CHAR(32) + CHAR(13), '$'),
(2, CHAR(10) + CHAR(13), '#'),
(3, CHAR(13) + CHAR(10), '@')

DECLARE @strings TABLE (Id INT, [str] VARCHAR(100))
INSERT INTO @strings
VALUES
(1, 'This is paragraph 1.' + CHAR(10) + CHAR(32) + CHAR(13) + 'This is paragraph 2.'),
(2, 'This is paragraph 1.' + CHAR(10) + CHAR(13) + 'This is paragraph 2.'),
(3, 'This is paragraph 1.' + CHAR(13) + CHAR(10) + 'This is paragraph 2.')


-- 2 & 3 dont match, but they should.
-- For some reason a trailing space was added to Id 2 & 3
-- in the @delimiters table at the time of the insert clause.
SELECT
    REPLACE(s.[str], d.delim, d.newDelim) AS NewStr
FROM @strings s
LEFT JOIN @delimiters d ON s.Id = d.Id


-- Adding a TRIM() fixes it
SELECT
    REPLACE(s.[str], TRIM(d.delim), d.newDelim) AS NewStr
FROM @strings s
LEFT JOIN @delimiters d ON s.Id = d.Id


-- Clear the @delimiters table
DELETE FROM @delimiters WHERE delim IS NOT NULL


-- Now Im adding a TRIM() during the insert.
-- You can literally add it around any
-- single or combination of CHAR()s in the delim column.
INSERT INTO @delimiters
VALUES
(1, TRIM(CHAR(10)) + CHAR(32) + CHAR(13), '$'),
(2, CHAR(10) + CHAR(13), '#'),
(3, CHAR(13) + CHAR(10), '@')

-- Now they all match, I only put the TRIM()
-- on the first CHAR() for ID=1 during the insert,
-- but it trimmed the trailing space for all of them.  
SELECT
    REPLACE(s.[str], d.delim, d.newDelim) AS NewStr
FROM @strings s
LEFT JOIN @delimiters d ON s.Id = d.Id

Result set I get:
Results

GSerg
  • 76,472
  • 17
  • 159
  • 346
Dontjoshme
  • 25
  • 4

1 Answers1

4

The VALUES clause essentially represents a table, that is, all values in each column must be of the same type - the one that covers the widest of the provided values.

The type of CHAR(10) + CHAR(32) + CHAR(13) is CHAR(3). It is the widest of the provided values, thus, the entire column is typed as char(3).
char, unlike varchar, requires padding with spaces, so your first VALUES clause inserts the second and the third delimiters padded with a space to the length of 3.

TRIM(), on the other hand, returns a varchar, which does not require padding with spaces. So including the TRIM in the VALUES makes the entire column varchar(3), and no spaces are added to the end of the last two lines.

GSerg
  • 76,472
  • 17
  • 159
  • 346
  • I'm not the OP - but had a look at this question. Your answer sounds good, but if you replace the first delimiter with `CHAR(10) + ' ' + CHAR(13)`, that would still be CHAR(3) and would have the same problem, wouldn't it? However, I think it seems to work. – seanb Jan 12 '23 at 00:33
  • 2
    @seanb `CHAR(10) + ' ' + CHAR(13)` [is `varchar(3)`](https://dbfiddle.uk/yu13EuAR). – GSerg Jan 12 '23 at 00:36
  • This is the root of the issue; you can confirm this if you `select * into from ` and then inspect the column type in `sys.columns` for the table – Stu Jan 12 '23 at 00:36
  • Ah that makes sense - the ' ' component is variable length so makes it a varchar. Nice - thumbs up! – seanb Jan 12 '23 at 00:37
  • So even though I declared delim as a VARCHAR(10) during the table decloration, its still treating it as a CHAR(3) until I add the TRIM()? – Dontjoshme Jan 12 '23 at 00:40
  • 1
    @Dontjoshme `VALUES` is a separate table in its own right, it does not assume the identity of the table you are copying its values to (nor can it in the general case). Similarly you would get an arithmetic overflow on `declare @d float = 2147483646 * 2`, even though the resulting value should fit in the `float`. It does not matter that you are eventually storing the result in the `float`, what matters is that you are multiplying two `int`s, the result of that is also typed as `int`, and the resulting value is bigger than what an `int` may contain. – GSerg Jan 12 '23 at 00:44
  • @GSerg Most impressive. +1 – John Cappelletti Jan 12 '23 at 00:57
  • 2
    @Dontjoshme - based on this answer, you could overcome the problem by explicitly making the delimiters into varchar e.g., inserting `(1, CAST(CHAR(10) + CHAR(32) + CHAR(13) AS VARCHAR(10)), '$')`. – seanb Jan 12 '23 at 01:01
  • Ok that makes more sense when you explain it that way. I had to read it about 5 times but it eventually clicked :-). Thanks! – Dontjoshme Jan 12 '23 at 01:01