2

According to my research and googling, Javascript seems to lack support for locale aware sorting and string comparisons. There is localeCompare(), but it has been reported of browser specific differencies and impossibility to explicitly set which locale is used (the OS locale is not always the one wanted). There is some intentions to add collation support inside ECMAScript, but before it, we are on our own. And depending how consistent the results are across browsers, may be we are on our own forever :(.

I have the following code, which makes alphabetical sort of an array. It's made speed in mind, and the ideas are got from https://stackoverflow.com/a/11598969/1691517, to which I made some speed improvements.

In this example, the words array has 13 members and the sort-function is called 34 times. I want to replace some of the letters in the words-array (you don't have to know what replacements are made, because it's not the point in this question). If I make these replacements in sort-function ( the one that starts with return function(a, b) ), the code is inefficient, because replacements are made more than once per array member. Of course I can make these replacements outside of this closure, I mean before the line words.sort(sortbyalphabet_timo);, but it's not what I want.

Question 1: Is it possible to modify the words-array in between the lines "PREPARATION STARTS" and "PREPARATION ENDS" so that the sort function uses modified words-array?

Question 2: Is it possible to input arguments to the closure so that code between PREPARATION STARTS and PREPARATION ENDS can use them? I have tried this without success:

var caseinsensitive = true;
words.sort( sortbyalphabet_timo(caseinsensitive) );

And here is finally the code example, and the ready to run example is in http://jsfiddle.net/3E7wb/:

var sortbyalphabet_timo = (function() {
  // PREPARATION STARTS
  var i, alphabet = "-0123456789AaÀàÁáÂâÃãÄäBbCcÇçDdEeÈèÉéÊêËëFfGgHhIiÌìÍíÎîÏïJjKkLlMmNnÑñOoÒòÓóÔôÕõÖöPpQqRrSsTtUuÙùÚúÛûÜüVvWwXxYyÝýŸÿZz",
  index = {};

  i = alphabet.length;
  while (i--) index[alphabet.charCodeAt(i)] = i;
  // PREPARATION ENDS

  return function(a, b) {
    var i, len, diff;

    if (typeof a === "string" && typeof b === "string") {
      (a.length > b.length) ? len = a.length : len = b.length;
      for (i = 0; i < len; i++) {
        diff = index[a.charCodeAt(i)] - index[b.charCodeAt(i)];

        if (diff !== 0) {
          return diff;
        }
      }
      // sort the shorter first
      return a.length - b.length;
    } else {
      return 0;
    }
  };
})();

var words = ['tauschen', '66', '55', '33', 'täuschen', 'andern', 'ändern', 'Ast', 'Äste', 'dosen', 'dösen', 'Donaudam-0', 'Donaudam-1'];
$('#orig').html(words.toString());
words.sort(sortbyalphabet_timo);
$('#sorted').html(words.toString());`
Community
  • 1
  • 1
Timo Kähkönen
  • 11,962
  • 9
  • 71
  • 112
  • You have a very interesting alphabet. – jbabey Sep 26 '12 at 16:59
  • Is it? The function is made by the intention, that the user can add whatever alphabet he/she wants. This is only an example of one possible character order. Please use the one you want. – Timo Kähkönen Sep 26 '12 at 17:06

1 Answers1

2

Is it possible to modify the words-array in between the lines "PREPARATION STARTS" and "PREPARATION ENDS" so that the sort function uses modified words-array?

No, not really. You don't have access to the array itself, your function only builds the compare-function that is later used when .sort is invoked on the array. If you needed to alter the array, you'll need to write a function that gets it as an argument; for example you could add a method on Array.prototype. It would look like

function mysort(arr) {
    // Preparation
    // declaration of compare function
    // OR execution of closure to get the compare function
    arr.sort(comparefn);
    return arr;
}

Is it possible to input arguments to the closure so that code between PREPARATION STARTS and PREPARATION ENDS can use them?

Yes, of course - that is the reason to use closures :-) However, you can't use sortbyalphabet_timo(caseinsensitive) with your current code. The closure you have is immediately invoked (called an IIFE) and returns the compare-function, which you pass into sort as in your demo.

If you want sortbyalphabet_timo to be the closure instead of the result, you have to remove the brackets after it. You also you can use arguments there, which are accessible in the whole closure scope (including the comparefunction):

var sortbyalphabet_timo_closure = function(caseinsensitive) {
    // Preparation, potentially using the arguments
    // Declaration of compare function, potentially using the arguments
    return comparefn;
}
// then use
words.sort(sortbyalphabet_timo_closure(true));

Currently, you are doing this:

var sortbyalphabet_timo_closure = function(/*having no arguments*/) {
    // Preparation, potentially using the arguments
    // Declaration of compare function, potentially using the arguments
    return comparefn;
}
var sortbyalphabet_timo = sortbyalphabet_timo_closure();
// then use
words.sort(sortbyalphabet_timo);

…which just caches the result of executing the closure, if you'd need to sort multiple times.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Thanks! Why is your non-closure proposal so slow compared to closure-version? Closure-version is about 25 (or 198 in Safari) times faster: http://jsperf.com/collation-string-sorting/12 – Timo Kähkönen Sep 26 '12 at 20:14
  • Um, you didn't pass the compare function, but the closure into `sort()`. I think from http://jsperf.com/collation-string-sorting/14 it should be obvious why one is (a bit) faster. – Bergi Sep 26 '12 at 20:23
  • This answer has been worth of +1, because here we come to the truth behind my question. I asked "is it possible", but the real meaning is "how can I achieve it". How can I get all of these: speed of closure-version (so nothing is done multiple times if not needed), using arguments to select eg. insensitiveness and modify words-array before it is sorted? Am I requiring something impossible? – Timo Kähkönen Sep 26 '12 at 21:46
  • Yes. Using arguments *and* speed of cached-closure-result is a bit contradictory. Modifying the array has nothing to do with the closure. – Bergi Sep 26 '12 at 21:54
  • Modifying words-array can be done of course outside closure. If we want to eg. make sort insensitive, we can convert array members to lowercase or uppercase and if we want to make sort accentinsensitive, we can convert all accents to non-accents. – Timo Kähkönen Sep 26 '12 at 22:24
  • This maybe offtopic, but because sorting rules are so different between locales, strict compare of letter by letter is not enough. The function of question assumes that all characters are of same weight. If 'v' and 'w' are baseletters, the function succeed at sorting, but if 'w' is variant of 'v', there is a problem. The other complicateness is two- or three-letter-combinations, eg. 'nj', 'dz'. So we have to modify both alphabet-string to accept more variations and also words-array to make it comparable letter-by-letter in those special cases. – Timo Kähkönen Sep 26 '12 at 22:44
  • I really miss a ready-made js-library for this, but not found so far. – Timo Kähkönen Sep 26 '12 at 22:45