0

The .NET function string.Trim trims a rather extensive set of whitespace characters. How would this exact behavior be emulated in the best way T-SQL?

LTRIM and RTRIM only trim the space character which is not enough.

This would be easy with SQL CLR but that is not always possible and also has quite a high invocation cost. So I would like to avoid SQL CLR.

Also, scalar UDFs force serial plans and are slow to invoke. So it probably should not be a scalar UDF, either.

Given these constraints what's the best way to go about this?

boot4life
  • 4,966
  • 7
  • 25
  • 47
  • 1
    Could you give us an exemple of what is the output that you want ? if you have a string like this : `' toto tata '`, do you want it like this `'tototata'` ? – Vincent G Feb 06 '16 at 21:41
  • You only want to trim whitespace at the ends of the string, not within the string? For example, you don't want to replace multiple tabs within the string with a single space or remove them entirely. – HABO Feb 06 '16 at 22:41
  • I want exactly what `string.Trim` outputs which is removal of leading and trailing whitespace. – boot4life Feb 06 '16 at 23:17
  • I've updated my answer to include the .NET Framework 4 whitespace characters and do left, right and double-ended trims. (The `LEN` vs. trailing blanks feature had me going in circles for a while.) – HABO Feb 07 '16 at 05:45
  • Okay, now it has been updated to have a "Hello world!" example, handle NULLs, empty strings, all whitespace strings, ... . Didn't try right-to-left text. Still on track. – HABO Feb 07 '16 at 20:48
  • @HABO awesome. I wish more people were visiting this fairly esoteric question to upvote the answer more. – boot4life Feb 07 '16 at 23:06
  • I thought the OP might [accept](http://meta.stackoverflow.com/questions/5234/how-does-accepting-an-answer-work) the answer if, in fact, it is acceptable. – HABO Feb 08 '16 at 04:23

2 Answers2

6

This code provides a pattern that you can extend to handle the whitespace of your choice for a modified LTRIM.

declare @Tab as NVarChar(1) = NChar( 9 );
declare @Space as NVarChar(1) = NChar( 32 );

declare @Samples as Table ( String NVarChar(16) );
insert into @Samples ( String ) values
  ( 'Foo' ),
  ( @Tab + 'Foo' ),
  ( @Space + 'Foo' ),
  ( @Space + @Tab + 'Foo' ),
  ( @Tab + @Space + 'Foo' );
select String, Len( String ) as [Length], PatIndex( '%[^' + @Tab + @Space + ']%', String ) - 1 as [WhitespaceCount]
  from @Samples;

The REVERSE function can be used to implement a modified version of RTRIM.

NEWER UPDATE: The following code uses the list of whitespace characters as used in .NET Framework 4. It also works around the feature of LEN not counting trailing blanks.

declare @Tab as NVarChar(1) = NChar( 9 );
declare @Space as NVarChar(1) = NChar( 32 );

declare @Samples as Table ( String NVarChar(16) );
insert into @Samples ( String ) values
  ( 'Foo' ),
  ( @Tab + 'Foo' ),
  ( @Space + 'Foo' ),
  ( @Space + @Tab + 'Foo' ),
  ( @Tab + @Space + 'Foo' ),
  ( @Tab + 'Foo' + @Space ),
  ( @Space + 'Foo' + @Tab ),
  ( @Space + @Tab + 'Foo' + @Tab + @Space ),
  ( @Tab + @Space + 'Foo' + @Space + @Tab ),
  ( 'Foo' + @Tab ),
  ( NULL ),
  ( '           ' ),
  ( @Space + NULL + @Tab + @Tab ),
  ( '' ),
  ( 'Hello world!' );

declare @WhitespacePattern as NVarChar(100) = N'%[^' +
  NChar( 0x0020 ) + NChar( 0x00A0 ) + NChar( 0x1680 ) + NChar( 0x2000 ) +
  NChar( 0x2001 ) + NChar( 0x2002 ) + NChar( 0x2003 ) + NChar( 0x2004 ) +
  NChar( 0x2005 ) + NChar( 0x2006 ) + NChar( 0x2007 ) + NChar( 0x2008 ) +
  NChar( 0x2009 ) + NChar( 0x200A ) + NChar( 0x202F ) + NChar( 0x205F ) +
  NChar( 0x3000 ) + NChar( 0x2028 ) + NChar( 0x2029 ) + NChar( 0x0009 ) +
  NChar( 0x000A ) + NChar( 0x000B ) + NChar( 0x000C ) + NChar( 0x000D ) +
  NChar( 0x0085 ) + N']%';
-- NB: The   Len   function does not count trailing spaces.
--     Use   DataLength   instead.
with AnalyzedSamples as (
  select String, DataLength( String ) / DataLength( NChar( 42 ) ) as [StringLength],
    PatIndex( @WhitespacePattern, String ) - 1 as [LeftWhitespace],
    PatIndex( @WhitespacePattern, Reverse( String ) ) - 1 as [RightWhitespace]
  from @Samples ),
  TrimmedSamples as (
  select String, StringLength, [LeftWhitespace], [RightWhitespace],
    case
      when String is NULL then NULL
      when LeftWhitespace = -1 then N''
      else Substring( String, LeftWhitespace + 1, StringLength - LeftWhitespace )
      end as [LTrim],
    case
      when String is NULL then NULL
      when RightWhitespace = -1 then N''
      else Reverse( Substring( Reverse( String ), RightWhitespace + 1, StringLength - RightWhitespace ) )
      end as [RTrim],
    case
      when String is NULL then NULL
      when LeftWhitespace = -1 then N''
      else Substring( String, LeftWhitespace + 1, StringLength - LeftWhitespace - RightWhitespace )
      end as [Trim]
    from AnalyzedSamples )
  select N'"' + String + N'"' as [String], StringLength, [LeftWhitespace], [RightWhitespace],
    N'"' + [LTrim] + N'"' as [LTrim], DataLength( [LTRIM] ) / DataLength( NChar( 42 ) ) as [LTrimLength],
    N'"' + [RTrim] + N'"' as [RTrim], DataLength( [RTRIM] ) / DataLength( NChar( 42 ) ) as [RTrimLength],
    N'"' + [Trim] + N'"' as [Trim], DataLength( [TRIM] ) / DataLength( NChar( 42 ) ) as [TrimLength]
    from TrimmedSamples;
HABO
  • 15,314
  • 5
  • 39
  • 57
  • Fantastic answer. You might want to add test cases for purely whitespace strings and maybe one for a string with a whitespace "hole" in the middle. In any case this is on track to being accepted :) – boot4life Feb 07 '16 at 12:11
1

I'll be interested to see if anyone finds a generic SQL solution.

The best I can come up with is a simple REPLACE function:

SELECT MyString = LEFT(MyString, LEN(RTRIM(REPLACE(REPLACE(REPLACE(MyString COLLATE Latin1_General_100_BIN2, NCHAR(9), ' '), NCHAR(12), ' '), NCHAR(13), ' ')))) AS RTrimmed

SELECT MyString = RIGHT(MyString, LEN(LTRIM(REPLACE(REPLACE(REPLACE(MyString COLLATE Latin1_General_100_BIN2, NCHAR(9), ' '), NCHAR(12), ' '), NCHAR(13), ' ')))) AS LTrimmed

etc.

You can get the list of current whitespace characters here:

http://unicode.org/charts/uca/chart_Whitespace.html

Or, to prove it to yourself, you could export a list of all characters from SQL Server to something like Excel, clean the characters, and import them back in. Whatever was removed was whitespace.

JJ32
  • 1,034
  • 1
  • 7
  • 24
  • This would remove whitespace in the middle of the string. – boot4life Feb 06 '16 at 23:18
  • I can see how you might think that given the number of Replaces I used. It works the same as @HABO's example, however I think his is more succinct. – JJ32 Feb 06 '16 at 23:23