136

I have the following test table in SQL Server 2005:

CREATE TABLE [dbo].[TestTable]
(
 [ID] [int] NOT NULL,
 [TestField] [varchar](100) NOT NULL
) 

Populated with:

INSERT INTO TestTable (ID, TestField) VALUES (1, 'A value');   -- Len = 7
INSERT INTO TestTable (ID, TestField) VALUES (2, 'Another value      '); -- Len = 13 + 6 spaces

When I try to find the length of TestField with the SQL Server LEN() function it does not count the trailing spaces - e.g.:

-- Note: Also results the grid view of TestField do not show trailing spaces (SQL Server 2005).
SELECT 
 ID, 
 TestField, 
 LEN(TestField) As LenOfTestField, -- Does not include trailing spaces
FROM 
 TestTable

How do I include the trailing spaces in the length result?

Jason Snelders
  • 5,471
  • 4
  • 34
  • 40
  • 4
    I think the real solution here might be for Microsoft to fix their broken software. Vote here: https://feedback.azure.com/forums/908035-sql-server/suggestions/34673914-add-setting-so-len-counts-trailing-whitespace – QA Collective Jun 27 '18 at 01:21

11 Answers11

150

This is clearly documented by Microsoft in MSDN at http://msdn.microsoft.com/en-us/library/ms190329(SQL.90).aspx, which states LEN "returns the number of characters of the specified string expression, excluding trailing blanks". It is, however, an easy detail on to miss if you're not wary.

You need to instead use the DATALENGTH function - see http://msdn.microsoft.com/en-us/library/ms173486(SQL.90).aspx - which "returns the number of bytes used to represent any expression".

Example:

SELECT 
    ID, 
    TestField, 
    LEN(TestField) As LenOfTestField,           -- Does not include trailing spaces
    DATALENGTH(TestField) As DataLengthOfTestField      -- Shows the true length of data, including trailing spaces.
FROM 
    TestTable
BenMorel
  • 34,448
  • 50
  • 182
  • 322
Jason Snelders
  • 5,471
  • 4
  • 34
  • 40
  • 58
    NOTE: For `DATALENGTH` you'll also need to divide the result by 2 if the expression being tested is a wide character type (Unicode; nchar, nvarchar or ntext), since the result is in *bytes*, not *characters*. – devstuff Jan 08 '10 at 07:27
  • Excellent point @devstuff and another potential gotcha for the unwary. – Jason Snelders Jan 08 '10 at 10:39
  • 8
    Also for `varchar` etc. this can be collation dependant and not even a straight forward division by 2 is reliable. See [example here](http://stackoverflow.com/questions/176514/what-is-the-difference-between-char-nchar-varchar-and-nvarchar-in-sql-server/8250586#8250586) – Martin Smith Jul 13 '13 at 11:33
  • 29
    I would use `LEN(REPLACE(expr, ' ', '_'))`. This should work with `varchar` and `nvarchar` and strings containing special unicode control characters. – Olivier Jacot-Descombes Apr 24 '14 at 16:01
  • 8
    -1, `DATALENGTH()` should not be considered an alternative way to count characters because it counts bytes instead of characters and this matters when representing the same string in `VARCHAR`/`NVARCHAR`. – binki Jul 16 '14 at 19:18
  • @binki devstuff already mentioned that his comment above. If you're going to deduct a person's reputation and make a statement that something should not be considered, then I ask you do it as a proper answer and provide explanation and reasoning for the answer, or give an alternative answer to help solve the original problem. – Jason Snelders Jul 16 '14 at 22:47
  • 1
    JasonSnelders, @Serge already provides [an alternative](http://stackoverflow.com/a/11080074/429091). – binki Jul 17 '14 at 13:27
  • 7
    Starting from SQL server 2012, unicode columns with version 100 collations now supports surrogate pairs. This means a single character may use up to 4 bytes, causing the divide by two trick to fail. See [msdn](https://msdn.microsoft.com/en-us/library/ms143726.aspx#Supplementary_Characters). – Frédéric May 26 '15 at 15:25
  • Here's a simple example of a surrogate pair that shows how the "divide by two" trick fails (making this whole answer unreliable): `SELECT LEN(N'' COLLATE Latin1_General_100_CI_AS_SC), DATALENGTH(N'' COLLATE Latin1_General_100_CI_AS_SC)`. The `LEN` gives 1, whilst `DATALENGTH` gives 4. – Douglas Nov 05 '15 at 18:19
  • Whether to count surrogate pairs as one or two characters is open to debate. (.NET's [`String.Length`](https://msdn.microsoft.com/en-us/library/system.string.length%28v=vs.110%29.aspx) counts them as two). However, if you need to use the computed length to perform further string operations in SQL, then you must abide by the `LEN` semantics (which count surrogate pairs as one character in Unicode `SC` collations). `SELECT STUFF(N'abce' COLLATE Latin1_General_100_CI_AS_SC, 4, 2, 'd')` causes the final `e` to be inadvertently lost due to the surrogate pair being assumed to have a length of 2. – Douglas Nov 05 '15 at 18:44
106

You can use this trick:

LEN(Str + 'x') - 1

Serge
  • 6,554
  • 5
  • 30
  • 56
  • 17
    Could you enlighten us with the better alternatives, please? Datalength sure isn't. – Serge Mar 25 '13 at 08:40
  • The top answer seems reasonable. For Unicode strings you probably have to divide by 2. L2S and EF do it that way. – usr Mar 25 '13 at 09:38
  • 19
    I strongly disagree that using a inconsistent method (in some case you divide its result by 2 and sometimes not) is a better option. Maybe is there a near to zero performance hit with my method though. – Serge Mar 25 '13 at 12:35
  • I think it's fine to use if you are just trying to determine if your column does, in fact, contain trailing spaces. It shouldn't be used in production code, but it is a good, quick test. I had a table with columns that used to be nchar, but are now nvarchar. It wasn't readily obvious in SSMS that they were padded, but this little trick proved my suspicion. – SouthShoreAK May 20 '13 at 18:54
  • 1
    @usr - And for this example? `DECLARE @T TABLE(C1 VARCHAR(20) COLLATE Chinese_Traditional_Stroke_Order_100_CS_AS_KS_WS);INSERT INTO @T VALUES (N'中华人民共和国');SELECT LEN(C1) AS [LEN(C1)],DATALENGTH(C1) AS [DATALENGTH(C1)] FROM @T ` – Martin Smith Jul 13 '13 at 11:36
  • @MartinSmith interesting. `LEN` seems to take into account Unicode pairs?! .NET's `string.Length` would not. Now it depends on what the OP wants to know - code points (`DATALENGTH/2`) or characters (the crazy trick from this answer). – usr Jul 13 '13 at 19:40
  • 5
    @usr Serge's method is the best, IMHO. Simple and elegant. DATALENGTH is complicated: single/double byte type dependent, collation/language dependent, etc. – Mr. TA Aug 13 '13 at 18:19
  • @usr I actually wrapped it inside a UDF, which I know is even slower, but performance wasn't that big of a deal in my case, whereas code cleanliness always is. I do agree that there should be another builtin function that doesn't ignore trailing spaces. – Mr. TA Aug 14 '13 at 16:01
  • 12
    This is the best, elegant solution so far. I don't really care if it FEELS like a hack or not (coding is not about feelings), I'm really care about the fact that this solution has no side effects. I can change data type varchar/nvarchar and it still works. Good job. – Mike Keskinov Jan 07 '14 at 21:54
  • 9
    There is a caveat because of this side effect. If you're working with a variable of type nvarchar(4000), and your variable contains a 4000 character string, the added character will be ignored, and you'll get the wrong result (SQL's len that ignores trailing spaces, less the 1 you subtract). – hatchet - done with SOverflow Jan 01 '15 at 17:05
  • 1
    Just used this for a project but needed to check for leading blanks as well so I used `LEN('x' + Str + 'x') - 2` to check for leading blanks – Henesnarfel Apr 29 '15 at 19:55
  • 5
    @Henesnarfel: `LEN` already counts leading blanks, so there's no need for the extra prefix. – Douglas Nov 05 '15 at 18:12
  • 4
    @Serge this breaks for me if the input string is exactly at the max type length (e. g. `DECLARE @x VARCHAR(8000) = '{8000 chars}'; SELECT LEN(@x + 'x') -- returns 8000`). Seems like we need to wrap STR as `CONVERT(NVARCHAR(MAX), Str)` to be fully robust to any string types – ChaseMedallion Dec 02 '16 at 16:08
  • 2
    @hatchet I believe this will account for the caveat: LEN(CAST(Str AS nvarchar(MAX)) + 'x') - 1 – Daniel W. Sep 07 '18 at 17:18
30

I use this method:

LEN(REPLACE(TestField, ' ', '.'))

I prefer this over DATALENGTH because this works with different data types, and I prefer it over adding a character to the end because you don't have to worry about the edge case where your string is already at the max length.

Note: I would test the performance before using it against a very large data set; though I just tested it against 2M rows and it was no slower than LEN without the REPLACE...

TTT
  • 22,611
  • 8
  • 63
  • 69
16

"How do I include the trailing spaces in the length result?"

You get someone to file a SQL Server enhancement request/bug report because nearly all the listed workarounds to this amazingly simple issue here have some deficiency or are inefficient. This still appears to be true in SQL Server 2012. The auto trimming feature may stem from ANSI/ISO SQL-92 but there seems to be some holes (or lack of counting them).

Please vote up "Add setting so LEN counts trailing whitespace" here:

https://feedback.azure.com/forums/908035-sql-server/suggestions/34673914-add-setting-so-len-counts-trailing-whitespace

Retired Connect link: https://connect.microsoft.com/SQLServer/feedback/details/801381

crokusek
  • 5,345
  • 3
  • 43
  • 61
  • 2
    The `datalength` solution is even worse starting from SQL server 2012, since it does now supports surrogate pairs in UTF-16, meaning a character may use up to 4 bytes. It is really time they fix the `len` function for complying with ANSI, or at least provide a dedicated function for counting char including trailing spaces. – Frédéric May 26 '15 at 15:32
  • 1
    The feedback link needs to be used more for this. It is baffling that this problem can only be searched via internet. I spent nearly 2 hours trying to figure out where I had made a mistake in my own code before even considering that the LEN() function was the cause of my disconnect. – Takophiliac Apr 03 '19 at 21:02
  • I agree with this but should allow a parameter to trim out whitespace.. as it makes string comparisons with EF a lot easier, not having to check if there is whitespace included when the iqueryable expression is built. – ganjeii Apr 29 '20 at 21:22
11

There are problems with the two top voted answers. The answer recommending DATALENGTH is prone to programmer errors. The result of DATALENGTH must be divided by the 2 for NVARCHAR types, but not for VARCHAR types. This requires knowledge of the type you're getting the length of, and if that type changes, you have to diligently change the places you used DATALENGTH.

There is also a problem with the most upvoted answer (which I admit was my preferred way to do it until this problem bit me). If the thing you are getting the length of is of type NVARCHAR(4000), and it actually contains a string of 4000 characters, SQL will ignore the appended character rather than implicitly cast the result to NVARCHAR(MAX). The end result is an incorrect length. The same thing will happen with VARCHAR(8000).

What I've found works, is nearly as fast as plain old LEN, is faster than LEN(@s + 'x') - 1 for large strings, and does not assume the underlying character width is the following:

DATALENGTH(@s) / DATALENGTH(LEFT(LEFT(@s, 1) + 'x', 1))

This gets the datalength, and then divides by the datalength of a single character from the string. The append of 'x' covers the case where the string is empty (which would give a divide by zero in that case). This works whether @s is VARCHAR or NVARCHAR. Doing the LEFT of 1 character before the append shaves some time when the string is large. The problem with this though, is that it does not work correctly with strings containing surrogate pairs.

There is another way mentioned in a comment to the accepted answer, using REPLACE(@s,' ','x'). That technique gives the correct answer, but is a couple orders of magnitude slower than the other techniques when the string is large.

Given the problems introduced by surrogate pairs on any technique that uses DATALENGTH, I think the safest method that gives correct answers that I know of is the following:

LEN(CONVERT(NVARCHAR(MAX), @s) + 'x') - 1

This is faster than the REPLACE technique, and much faster with longer strings. Basically this technique is the LEN(@s + 'x') - 1 technique, but with protection for the edge case where the string has a length of 4000 (for nvarchar) or 8000 (for varchar), so that the correct answer is given even for that. It also should handle strings with surrogate pairs correctly.

  • 1
    Unfortunately, this answer no longer works for strings containing surrogate pairs in SQL Server 2012. Running your operation on `N'xx' COLLATE Latin1_General_100_CI_AS_SC` gives 4, whilst `LEN` gives 3. – Douglas Nov 05 '15 at 18:29
  • 9
    @Douglas - That's useful information. If only Microsoft would just give us a version of LEN that doesn't ignore trailing spaces. – hatchet - done with SOverflow Nov 05 '15 at 18:40
7

LEN cuts trailing spaces by default, so I found this worked as you move them to the front

(LEN(REVERSE(TestField))

So if you wanted to, you could say

SELECT
t.TestField,
LEN(REVERSE(t.TestField)) AS [Reverse],
LEN(t.TestField) AS [Count]
FROM TestTable t
WHERE LEN(REVERSE(t.TestField)) <> LEN(t.TestField)

Don't use this for leading spaces of course.

Joey
  • 95
  • 1
  • 1
  • 12
    Now it trims **leading** spaces instead of trailing spaces. Same day, different problem :) – Reversed Engineer Jul 25 '16 at 13:32
  • @DaveBoltman My suggestion is probably more convoluted still, but you could additionally compare against the TRIM'ed length. – Brian J Oct 11 '16 at 19:38
  • This reverses the bug where leading spaces are not counted instead of trailing spaces. See the following code: `declare @TestField varchar(10);` `SET @TestField = ' abc '; -- Length with spaces is 5.` `select LEN(REVERSE(@TestField)) -- Returns 4` `select LEN(@TestField) -- Returns 4` – Metalogic Aug 29 '17 at 17:26
5

You need also to ensure that your data is actually saved with the trailing blanks. When ANSI PADDING is OFF (non-default):

Trailing blanks in character values inserted into a varchar column are trimmed.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • 3
    I think you should not turn off ANSI PADDING as this setting is obsolete. Having it at a non-standard value causes many small problems. – usr Nov 17 '12 at 16:17
2

This is the best algorithm I've come up with which copes with the maximum length and variable byte count per character issues:

ISNULL(LEN(STUFF(@Input, 1, 1, '') + '.'), 0)

This is a variant of the LEN(@Input + '.') - 1 algorithm but by using STUFF to remove the first character we ensure that the modified string doesn't exceed maximum length and remove the need to subtract 1.

ISNULL(..., 0) is added to deal with the case where @Input = '' which causes STUFF to return NULL.

This does have the side effect that the result is also 0 when @Input is NULL which is inconsistent with LEN(NULL) which returns NULL, but this could be dealt with by logic outside this function if need be

Here are the results using LEN(@Input), LEN(@Input + '.') - 1, LEN(REPLACE(@Input, ' ', '.')) and the above STUFF variant, using a sample of @Input = CAST(' S' + SPACE(3998) AS NVARCHAR(4000)) over 1000 iterations

Algorithm DataLength ExpectedResult Result ms
LEN 8000 4000 2 14
+DOT-1 8000 4000 1 13
REPLACE 8000 4000 4000 514
STUFF+DOT 8000 4000 4000 0

In this case the STUFF algorithm is actually faster than LEN()!

I can only assume that internally SQL looks at the last character and if it is not a space then optimizes the calculation But that's a good result eh?

Don't use the REPLACE option unless you know your strings are small - it's hugely inefficient

Mr Surfy
  • 61
  • 5
1

You should define a CLR function that returns the String's Length field, if you dislike string concatination. I use LEN('x' + @string + 'x') - 2 in my production use-cases.

Robin Ellerkmann
  • 2,083
  • 4
  • 29
  • 34
obratim
  • 467
  • 5
  • 13
1

If you dislike the DATALENGTH because of of n/varchar concerns, how about:

select DATALENGTH(@var)/isnull(nullif(DATALENGTH(left(@var,1)),0),1)

which is just

select DATALENGTH(@var)/DATALENGTH(left(@var,1))

wrapped with divide-by-zero protection.

By dividing by the DATALENGTH of a single char, we get the length normalised.

(Of course, still issues with surrogate-pairs if that's a concern.)

dsz
  • 4,542
  • 39
  • 35
-4

use SELECT DATALENGTH('string ')

aman6496
  • 143
  • 2
  • 13
  • 3
    you just restated others' answers from 7 years earlier and provider nothing new or even explain what you answer does or how it answers that question. – Jpsh Feb 15 '18 at 17:42