3

I want to count the characters in an ntext field. Following Pinal Dave's advice, I am using datalength. But this function seems to double the value I am looking for. When I copy the value in the field into word and count the characters, I get 1502. But when I do

select datalength(result) from myTable 

I get a value of 3004 characters.

Why?

Bridge
  • 29,818
  • 9
  • 60
  • 82
bernie2436
  • 22,841
  • 49
  • 151
  • 244
  • Unrelated - be careful of year(entrydate). That can't use an index now. While entrydate >= '2012-01-01' and entrydate < '2013-01-01' would – Brian White Apr 04 '12 at 20:10

1 Answers1

8

Unicode is two bytes per character. Your NText field is a Unicode string. DataLength() returns the number of bytes required to store a field, Len() returns the number of characters.

From Len(): "Returns the number of characters of the specified string expression, excluding trailing blanks." DataLength does not exclude trailing blanks. For Unicode strings you can use DataLength( UnicodeStringExpression ) / DataLength( N'#' ) to get the length in characters.

In general DataLength( Left( Coalesce( StringExpression, '#' ), 1 ) ) will return the number of bytes per character since Coalesce returns a value based on data type precedence where Unicode strings are higher precedence than byte-string types (char and varchar).

declare @Foo as VarChar(10) = 'Foo and ';
declare @Bar as NVarChar(10) = N'Bar and ';

select @Foo as [@Foo],
  Len( @Foo ) as [Len (trimmed)], DataLength( @Foo ) as [DataLength (bytes)],
  DataLength( Left( Coalesce( @Foo, '#' ), 1 ) ) as BytesPerCharacter,
  DataLength( @Foo ) / DataLength( Left( Coalesce( @Foo, '#' ), 1 ) ) as 'Characters';

select @Bar as [@Bar],
  Len( @Bar ) as [Len (trimmed)], DataLength( @Bar ) as [DataLength (bytes)],
  DataLength( Left( Coalesce( @Bar, '#' ), 1 ) ) as BytesPerCharacter,
  DataLength( @Bar ) / DataLength( Left( Coalesce( @Bar, '#' ), 1 ) ) as 'Characters';
HABO
  • 15,314
  • 5
  • 39
  • 57
  • 1
    if you want to know the length of a text field, you are stuck with datalength because len() wont work. So does that mean that you have to always have the datalength of the character (because unicode is 2 bytes per character). – bernie2436 Apr 04 '12 at 20:15
  • @akh2103 - You simply need to divide the result of `DataLength` by two to get the number of characters. You can divide by `DataLength( N'X' )` as a hint/reminder. – HABO Apr 04 '12 at 21:32
  • 2
    And `Len()` ignores trailing spaces. – StevenWhite Jul 23 '13 at 19:19