3

I have an Array that I need to sort exactly like using order by in Oracle SQl. If I have following Array:

var array = ['Ba12nes','Apfel','Banane','banane','abc','ABC','123','2', null,'ba998ne']
var.sort(compare);

I would like to have the following result

var array = ['abc','ABC','Apfel','banane','Banane','Ba12nes','ba998ne','123','2', null]

If the null values are somewhere else, I don't have a Problem with it.

My current solution, which does not help me ^^

function compare(a,b) {

if(a == null)
    return -1;
if (b == null)
    return 1;
  if (a.toLowerCase() < b.toLowerCase())
     return -1;
  if (a.toLowerCase() > b.toLowerCase())
    return 1;
  return 0;
}

I do understand that i need a custom sorting function. And at the moment I am thinking that only a regular expression can solve the problem of sorting the string values in front of the numbers. But I am still not sure how to solve the Problem with lowercase letters in bevor Uppercase letters.

  • Usually the numbers are ordered before the chars (Your code will do that). If you really want the chars before the numbers, you will have to sort one char at a time. – Racil Hilan Oct 21 '15 at 13:29
  • Related, but not a duplicate http://stackoverflow.com/questions/2802341/javascript-natural-sort-of-alphanumerical-strings – Sean Vieira Oct 21 '15 at 13:31
  • 1
    "exactly like using order by in Oracle" - which is somewhat [dependent on NLS settings](https://docs.oracle.com/cd/E11882_01/server.112/e10729/ch5lingsort.htm), so you probably need to define your requirement explicitly rather than in relation to Oracle. – Alex Poole Oct 21 '15 at 16:53

3 Answers3

3

Iirc, Oracle implements a 3-tiered lexicographic sorting (but heed the advice of Alex Poole and check the NLS settings first):

  • First sort by base characters ignoring case and diacritics, digits come after letters in the collation sequence.
  • Second, on ties sort respecting diacritics, ignoring case.
  • Third, on ties sort by case.

You can emulate the behavior using javascript locale apis by mimicking each step in turn in a custom compare function, with the exception of the letter-digit inversion in the collation sequence.

Tackle the latter by identifying 10 contiguous code points that do not represent digits and that lie beyond the set of code points that may occur in the strings you are sorting. Map digits onto the the chosen code point range preserving order. When you sort, specify the Unicode collating extension 'direct' which means 'sorting by code point'. Remap after sorting.

In the PoC code below I have chosen some cyrillic characters.

function cmptiered(a,b) {
    //
    // aka oracle sort
    //
    return lc_base.compare(a, b) || lc_accent.compare(a, b) || lc_case.compare(a, b);
}  // cmptiered

var lc_accent   = new Intl.Collator('de', { sensitivity: 'accent' });
var lc_base     = new Intl.Collator('de-DE-u-co-direct', { sensitivity: 'base' });
var lc_case     = new Intl.Collator('de', { caseFirst: 'lower', sensitivity: 'variant' });

var array = ['Ba12nes','Apfel','Banane','banane','abc','ABC','123','2', null, 'ba998ne' ];

// Map onto substitute code blocks
array = array.map ( function ( item ) { return (item === null) ? null : item.replace ( /[0-9]/g, function (c) { return String.fromCharCode(c.charCodeAt(0) - "0".charCodeAt(0) + "\u0430".charCodeAt(0)); } ); } );

array.sort(cmptiered);

// Remap substitute code point
array = array.map ( function ( item ) { return (item === null) ? null : item.replace ( /[\u0430-\u0439]/g, function (c) { return String.fromCharCode(c.charCodeAt(0) - "\u0430".charCodeAt(0) + "0".charCodeAt(0)); } ); } );

Edit

Function cmptiered streamlined following Nina Scholz' comment.

collapsar
  • 17,010
  • 4
  • 35
  • 61
  • you may change the switch block with a simple `return lc_base.compare(a, b) || lc_accent.compare(a, b) || lc_case.compare(a, b);` – Nina Scholz Oct 21 '15 at 17:43
  • Most databases have NLS_COMP=BINARY. According to [the manual](https://docs.oracle.com/cd/B28359_01/server.111/b28286/sql_elements002.htm): "In binary comparison, which is the default, Oracle compares character strings according to the concatenated value of the numeric codes of the characters in the database character set." – Jon Heller Oct 21 '15 at 18:15
  • But according to [another manual](https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch5lingsort.htm#CIHBFIID) some things abide by `NLS_SORT` even when `NLS_COMP=BINARY`, most notably `ORDER BY` - which is what OP's question was about. – Vsevolod Golovanov Dec 09 '20 at 17:03
  • (I can confirm that that is the case with `ORDER BY`, at least with Oracle 11.2.0.4.0.) – Vsevolod Golovanov Dec 09 '20 at 17:09
  • I observe uppercase < lowercase though. – Vsevolod Golovanov Dec 09 '20 at 18:50
2

This proposals feature sorting without use of Intl.Collator. The first solution works with direct sort and comparing the given values.

var array = ['Ba12nes', 'Apfel', 'Banane', 'banane', 'abc', 'ABC', '123', '2', null, 'ba998ne'];
array.sort(function (a, b) {
    var i = 0;
    if (a === null && b === null) { return 0; }
    if (a === null) { return 1; }
    if (b === null) { return -1; }

    while (i < a.length && i < b.length && a[i].toLocaleLowerCase() === b[i].toLocaleLowerCase()) {
        i++;
    }

    if (isFinite(a[i]) && isFinite(b[i])) { return a[i] - b[i]; }
    if (isFinite(a[i])) { return 1; }
    if (isFinite(b[i])) { return -1; }

    return a.localeCompare(b);
});
document.write(JSON.stringify(array));

The second solution features a different approach, based on Sorting with map and a custom sort scheme which takes a new string. The string is build by this rules:

  1. If the value is null take the string 'null'.
  2. If a character is a decimal, takes the character with space paddded around, eg. if it is 9 take the string ' 9 '.
  3. Otherwise for every other character take two spaces and the character itself, like ' o'.

The new build string is used with a a.value.localeCompare(b.value).

Here are the strings with the mapped values:

'  B  a 1  2   n  e  s'
'  A  p  f  e  l'
'  B  a  n  a  n  e'
'  b  a  n  a  n  e'
'  a  b  c'
'  A  B  C'
' 1  2  3 '
' 2 '
'null'
'  b  a 9  9  8   n  e'

sorted, it became

'  a  b  c'
'  A  B  C'
'  A  p  f  e  l'
'  b  a  n  a  n  e'
'  B  a  n  a  n  e'
'  B  a 1  2   n  e  s'
'  b  a 9  9  8   n  e'
' 1  2  3 '
' 2 '
'null'

var array = ['Ba12nes', 'Apfel', 'Banane', 'banane', 'abc', 'ABC', '123', '2', null, 'ba998ne'],
    mapped = array.map(function (el, i) {
        var j, o = { index: i, value: '' };
        if (el === null) {
            o.value = 'null';
            return o;
        }
        for (j = 0; j < el.length; j++) {
            o.value += /\d/.test(el[j]) ? ' ' + el[j] + ' ' : '  ' + el[j];
        }
        return o;
    });
mapped.sort(function (a, b) {
    return a.value.localeCompare(b.value);
});
var result = mapped.map(function (el) {
    return array[el.index];
});
document.write(JSON.stringify(result));
Nina Scholz
  • 376,160
  • 25
  • 347
  • 392
0

A simple head on solution that works at least for english & russian (mimicking NLS_SORT=RUSSIAN) and doesn't rely on fancy things like Intl.Collator, locales and options that don't exist for IE<11.

function compareStringOracle(str1, str2) {
    if (str1 == null && str2 != null)
        return 1;
    else if (str1 != null && str2 == null)
        return -1;
    else if (str1 == null && str2 == null)
        return 0;
    else {
        return compareStringCaseInsensitiveDigitsLast(str1, str2) ||
            /* upper case wins between otherwise equal values, which can be checked with
                a simple binary comparison (at least for eng & rus) */
            ((str1 < str2) ? -1 : (str1 > str2) ? 1 : 0);
    }
}

function compareStringCaseInsensitiveDigitsLast(str1, str2) {
    for (var i = 0; i < str1.length; ++i) {
        if (i === str2.length)
            return 1;

        // toLocaleLowerCase is unnecessary for eng & rus
        var c1 = str1.charAt(i).toLowerCase();
        var c2 = str2.charAt(i).toLowerCase();

        var d1 = "0" <= c1 && c1 <= "9";
        var d2 = "0" <= c2 && c2 <= "9";

        if (!d1 && d2)
            return -1;
        else if (d1 && !d2)
            return 1;
        else if (c1 !== c2)
            return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
    }

    if (str1.length < str2.length)
        return -1;
    else
        return 0;
}
Vsevolod Golovanov
  • 4,068
  • 3
  • 31
  • 65