3

I was trying what I hoped to be a trivial exercise in coding: sorting Javascript strings with an ASCII style lexicographical order (e.g. numbers before capitalized letters before lowercased letters...).

Here's a snippet:

var str1 = "ab";
var str2 = "Ab";
var n = str1.localeCompare(
    str2, "en", {sensitivity: 'variant', caseFirst: "upper"}
);

In this case, I would expect n to be 1, but it returns -1 instead.

From the documentation on this page:

  • the sensitivity value set as variant would allow differentiating between all base and accented letters, including case
  • the caseFirst value set as upper would force upper-cased letters to compare smaller than lower-cased letters
  • the lack of usage parametrization would default to sort, which would be irrelevant here anyway since I'm specifying variant
  • the lack of ignorePunctuation parametrization would default to false

I am assuming the options override the default locale settings, although I couldn't find any specific information on the matter. In truth if it defaulted to en-US and had priority over the options, then I imagine case would be ignored (e.g. see accepted answer here).

What am I doing wrong?

Notes

  • I am mentioning "ASCII" here for the sole purpose of identifying a sorting order that does not ignore case and sorts uppercase letters before lowercase letters where applicable. I would ultimately employ this for unicode strings as well.
  • As suggested by some, this is likely engine-dependent. Replicated with Firefox ESR 52.6.0, and Chromium 64.0.3282.167.
Mena
  • 47,782
  • 11
  • 87
  • 106
  • Running your code, I actually get `1`. – Patrick Roberts Mar 13 '18 at 14:32
  • 1
    it is browser dependent which result you get. it works for chrome, but not in edge. – Nina Scholz Mar 13 '18 at 14:42
  • @NinaScholz thanks, that makes sense. Added browser in notes. – Mena Mar 13 '18 at 14:44
  • Now added browser*s* in notes... – Mena Mar 13 '18 at 14:47
  • are you looking for a solution which works always? something like https://stackoverflow.com/questions/33260479/javascript-sorting-an-array-like-order-by-in-oracle/33269451#33269451 with with changed cases? – Nina Scholz Mar 13 '18 at 14:50
  • @NinaScholz Thanks for the link. Interesting custom sorting, but unfortunately your solution #1 doesn't look like it would be working for me. Firstly it doesn't seem to yield `digit < letter` and from the specifications of the question it won't yield `uppercase letter < lowercase letter` either. – Mena Mar 13 '18 at 14:58
  • 1
    Note: All strings in JavaScript are Unicode (UTF-16 encoding). Also, while JavaScript does support locales, it does not natively support identifying all digit and letter characters, including by Unicode categories. So, you'd have to use lots of codepoint ranges for that. Google for code generators that help in this case. – Tom Blodget Mar 13 '18 at 15:36

2 Answers2

1

You could use a workaround by looking of the case of the letters and use a helper string which reflects the position of the upper and lower letters.

Helper array before sorting

index  value
-----  -----
   0    a b 
   1    a  B
   2     Ab 
   3     A B

after sorting

index  value
-----  -----
   3    A B
   2    Ab 
   1   a  B
   0   a b 

var array = ['ab', 'aB', 'Ab', 'AB'],
    mapped = array
        .map((el, i) => ({ index: i, value: [...el].map(c => c === c.toUpperCase() ? ' ' + c : c + ' ').join('') }))
        .sort((a, b) => a.value.localeCompare(b.value)),
    result = mapped.map(el => array[el.index]);

console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }
Nina Scholz
  • 376,160
  • 25
  • 347
  • 392
  • Thanks Nina. That seems to be more cumbersome than Patrick's solution, so ultimately I'd pick his for simplicity. – Mena Mar 13 '18 at 15:52
1

I'm not sure how you're getting -1. When I run the exact code you've provided, I get 1. I only get -1 when testing with str1.localeCompare(str2). Perhaps make note of this warning:

Implementations are not required to support this property.

If your sensitivity is set to variant, and your caseFirst to upper, that's already the default comparison for strings. Locale is typically used for synonymizing character variants. Also, in ASCII and Unicode, uppercase already comes before lowercase. So you just need -(str1 < str2) || +(str1 > str2) and avoid the function call altogether:

var str1 = "ab";
var str2 = "Ab";
var a = str1.localeCompare(
    str2, "en", {sensitivity: 'variant', caseFirst: "upper"}
);
var b = -(str1 < str2) || +(str1 > str2);
console.log(a, b);
Patrick Roberts
  • 49,224
  • 10
  • 102
  • 153
  • Thanks for your answer. Nina suggests this is browser related. Will try around with different engines - I was expecting my code to be supported by all but that doesn't mean the functionality might differ ultimately I guess... sigh. – Mena Mar 13 '18 at 14:45
  • Updated browsers and `var a` is still `-1` for me when I run your code right from the snippet. However weirdly enough your workaround solution does return `1` - investigating a bit more... – Mena Mar 13 '18 at 14:50
  • @Mena the "workaround solution", which I'm suggesting as an actual solution and not a browser-specific hack, relies on the specification. It's not weird that it returns `1`, its comparisons are based on the character unicode values. – Patrick Roberts Mar 13 '18 at 15:01
  • my poor choice of words. I am considering using that as it looks like the only reliable solution so far. – Mena Mar 13 '18 at 15:10
  • It occurs to me that there actually is a reason for you wanting to use `localeCompare()` though: With the proper support, you're going for the order: `AaBbCcDdEe...` whereas ASCII / Unicode order is `ABCDE... abcde...` e.g. you may not want `Z` < `a` – Patrick Roberts Mar 13 '18 at 15:13
  • actually I **do** want `Z < a` in this instance, hence the problem with `localeCompare` in the first place :) This is mimicking a Java-based sorting algorithm in the back-end where the default comparator will always consider upper-cased letters as less than lower-cased letters amongst others. – Mena Mar 13 '18 at 15:20