-4

I'm trying to get my head around string comparisons in Javascript

function f(str){
  return str[0] < str[str.length -1]
}
f("a+"); // false

In ASCII: 'a' == 97, '+' == 43

Am I correct in thinking my test: f(str) is based on ASCII values above?

cookie
  • 39
  • 1
  • 7

3 Answers3

2

You don't need a function or a complicated test pulling a string apart for this. Just do 'a' < '+' and learn from what happens. Or, more simply, check the char's charcode using 'a'.charCodeAt(0).

user513951
  • 12,445
  • 7
  • 65
  • 82
1

You are almost right. It is based on unicode code units (not code points, this is the 16-bit encoded version), not ascii on values.

From the ECMAScript 2015 specification:

If both px and py are Strings, then
  If py is a prefix of px, return false. (A String value p is a prefix of String value q if q can be the result of concatenating p and some other String r. Note that any String is a prefix of itself, because r may be the empty String.)
  If px is a prefix of py, return true.
  Let k be the smallest nonnegative integer such that the code unit at index k within px is different from the code unit at index k within py. (There must be such a k, for neither String is a prefix of the other.)
  Let m be the integer that is the code unit value at index k within px.
  Let n be the integer that is the code unit value at index k within py.
  If m < n, return true. Otherwise, return false.

Note2

The comparison of Strings uses a simple lexicographic ordering on sequences of code unit values. There is no attempt to use the more complex, semantically oriented definitions of character or string equality and collating order defined in the Unicode specification. Therefore String values that are canonically equal according to the Unicode standard could test as unequal. In effect this algorithm assumes that both Strings are already in normalized form. Also, note that for strings containing supplementary characters, lexicographic ordering on sequences of UTF-16 code unit values differs from that on sequences of code point values.

Basically it means that string comparison is based on a lexicographical order of "code units", which is the numeric value of unicode characters.

Tamas Hegedus
  • 28,755
  • 12
  • 63
  • 97
0

JavaScript engines are allowed to use either UCS-2 or UTF-16 (which is the same for most practical purposes).

So, technically, your function is based on UTF-16 values and you were comparing 0x0061 and 0x002B.

Alex Pakka
  • 9,466
  • 3
  • 45
  • 69