1

I have a web application that users use to send messages.

The problem is that the number of characters in a message determine the cost of sending the message.

I have noticed that the javascript UI code counts the characters just fine but the DBMS's in built functions sometimes return a higher number of character.

Here is an example of a string that exhibit's this anomalous behaviour:

String with different lengths..
This string has different lengths depending on the programming language use to count the characters.

Transact SQL LEN() and MySQL LENGTH() return 217.

Python len() returns 212.

The standard string length functions in Javascript and Python return similar values but lower than the values return by Transact-SQL's LEN() and DATALENGTH() and MySQL's LENGTH() (which also return values similar to each other).

So why the different values?

Steve S
  • 509
  • 1
  • 11
  • 24
  • Are these UTF8 strings? Counts may depend on whether the raw or encoded chars are counted. – tdelaney Feb 02 '15 at 19:40
  • @tdelaney, perhaps. I am wondering if whether collation could be causing this but I can't think of a way to test this. – Steve S Feb 03 '15 at 14:38

2 Answers2

1

I noticed that this only happens when the strings have newline characters in them.

SQL server counts '\r\n' as two characters.

My solution was to count the characters using something like

LEN(REPLACE(the_string, CHAR(13), ''))

to get rid of carriage return before counting the length of the string.

This stackoverflow entry and this one helped me a lot.

Community
  • 1
  • 1
Steve S
  • 509
  • 1
  • 11
  • 24
0

I can't speak to MySQL. For SQL Server:

LEN() function tells you how many characters exist in the string, excluding trailing white space. DATALENGTH() tells you how much space a given string takes up. For varchar data types, this will be 1 byte per character. For nvarchar data types, it's two bytes per character. Note that it DOES count white space when using DATALENGTH(). Here are some examples showing the use of the two functions with different strings and data types

select 
    LenTrailingSpace = len(' abc '),
    LenNoTrailingSpace = len(' abc'),
    DatalengthTrailingSpace = datalength(' abc '),
    DatalengthNoTrailingSpace = datalength(' abc'),
    UnicodeDatalengthTrailingSpace = datalength(N' abc '),
    UnicodeDatalengthNoTrailingSpace = datalength(N' abc')
Xedni
  • 3,662
  • 2
  • 16
  • 27
  • But in my case LEN() and DATALENGTH() both return the same value. I just wanted to rule out any trimming on whitespaces. – Steve S Feb 03 '15 at 12:41