String comparison - Javascript

Question

I'm trying to get my head around string comparisons in Javascript

function f(str){
  return str[0] < str[str.length -1]
}
f("a+"); // false

In ASCII: 'a' == 97, '+' == 43

Am I correct in thinking my test: f(str) is based on ASCII values above?

you're close: it's the unicode value that matters when comparing two `String`s — dandavis, Dec 23 '15 at 15:40
Check [this](https://es5.github.io/#x11.8.5) out (the relevant part begins with 'Else, both px and py are Strings'). — raina77ow, Dec 23 '15 at 15:43
@nicael his point was that it works for characters beyond the ASCII set. — user513951, Dec 23 '15 at 15:44
There's a related question, not quite a duplicate, that can shed some light: http://stackoverflow.com/questions/8715980/javascript-strings-utf-16-vs-ucs-2 — Erick G. Hagstrom, Dec 23 '15 at 15:49

user513951 · Answer 1 · 2015-12-23T15:47:04.363

2

You don't need a function or a complicated test pulling a string apart for this. Just do 'a' < '+' and learn from what happens. Or, more simply, check the char's charcode using 'a'.charCodeAt(0).

edited Dec 23 '15 at 15:47

answered Dec 23 '15 at 15:41

user513951

12,445
7
65
82

score 1 · Accepted Answer · answered Dec 23 '15 at 15:50

You are almost right. It is based on unicode code units (not code points, this is the 16-bit encoded version), not ascii on values.

From the ECMAScript 2015 specification:

If both px and py are Strings, then
  If py is a prefix of px, return false. (A String value p is a prefix of String value q if q can be the result of concatenating p and some other String r. Note that any String is a prefix of itself, because r may be the empty String.)
  If px is a prefix of py, return true.
  Let k be the smallest nonnegative integer such that the code unit at index k within px is different from the code unit at index k within py. (There must be such a k, for neither String is a prefix of the other.)
  Let m be the integer that is the code unit value at index k within px.
  Let n be the integer that is the code unit value at index k within py.
  If m < n, return true. Otherwise, return false.

Note2

The comparison of Strings uses a simple lexicographic ordering on sequences of code unit values. There is no attempt to use the more complex, semantically oriented definitions of character or string equality and collating order defined in the Unicode specification. Therefore String values that are canonically equal according to the Unicode standard could test as unequal. In effect this algorithm assumes that both Strings are already in normalized form. Also, note that for strings containing supplementary characters, lexicographic ordering on sequences of UTF-16 code unit values differs from that on sequences of code point values.

Basically it means that string comparison is based on a lexicographical order of "code units", which is the numeric value of unicode characters.

score 0 · Answer 3 · answered Dec 23 '15 at 15:45

0

JavaScript engines are allowed to use either UCS-2 or UTF-16 (which is the same for most practical purposes).

So, technically, your function is based on UTF-16 values and you were comparing 0x0061 and 0x002B.

answered Dec 23 '15 at 15:45

Alex Pakka

9,466
3
45
69

String comparison - Javascript

3 Answers3