I have a problem right now that is the result of current limitations on a server our team does not control.
We have a job that should have be done by database but we're forced to use a XML file and parse it using Javascript/jQuery. We don't even have write access for our scripts (only through our FTP account)... we don't like to talk about it but that's what we got.
The problem, as a result of those limitations, is that we need to parse a large XML file that's around 500kb, with 1700-ish records of document name/number/url.
The number is pretty complex, such as "31-2b-1029E", mixed with stuff like "T2315342".
So, I have figured that I need to use something called "Natural Sort" (thank you stackoverflow).
Anyways I tried using this script here:
/*
* Reference: http://www.overset.com/2008/09/01/javascript-natural-sort-algorithm/
* Natural Sort algorithm for Javascript - Version 0.6 - Released under MIT license
* Author: Jim Palmer (based on chunking idea from Dave Koelle)
* Contributors: Mike Grier (mgrier.com), Clint Priest, Kyle Adams, guillermo
*/
function naturalSort (a, b) {
var re = /(^-?[0-9]+(\.?[0-9]*)[df]?e?[0-9]?$|^0x[0-9a-f]+$|[0-9]+)/gi,
sre = /(^[ ]*|[ ]*$)/g,
dre = /(^([\w ]+,?[\w ]+)?[\w ]+,?[\w ]+\d+:\d+(:\d+)?[\w ]?|^\d{1,4}[\/\-]\d{1,4}[\/\-]\d{1,4}|^\w+, \w+ \d+, \d{4})/,
hre = /^0x[0-9a-f]+$/i,
ore = /^0/,
// convert all to strings and trim()
x = a.toString().replace(sre, '') || '',
y = b.toString().replace(sre, '') || '',
// chunk/tokenize
xN = x.replace(re, '\0$1\0').replace(/\0$/,'').replace(/^\0/,'').split('\0'),
yN = y.replace(re, '\0$1\0').replace(/\0$/,'').replace(/^\0/,'').split('\0'),
// numeric, hex or date detection
xD = parseInt(x.match(hre)) || (xN.length != 1 && x.match(dre) && Date.parse(x)),
yD = parseInt(y.match(hre)) || xD && y.match(dre) && Date.parse(y) || null;
// first try and sort Hex codes or Dates
if (yD)
if ( xD < yD ) return -1;
else if ( xD > yD ) return 1;
// natural sorting through split numeric strings and default strings
for(var cLoc=0, numS=Math.max(xN.length, yN.length); cLoc < numS; cLoc++) {
// find floats not starting with '0', string or 0 if not defined (Clint Priest)
oFxNcL = !(xN[cLoc] || '').match(ore) && parseFloat(xN[cLoc]) || xN[cLoc] || 0;
oFyNcL = !(yN[cLoc] || '').match(ore) && parseFloat(yN[cLoc]) || yN[cLoc] || 0;
// handle numeric vs string comparison - number < string - (Kyle Adams)
if (isNaN(oFxNcL) !== isNaN(oFyNcL)) return (isNaN(oFxNcL)) ? 1 : -1;
// rely on string comparison if different types - i.e. '02' < 2 != '02' < '2'
else if (typeof oFxNcL !== typeof oFyNcL) {
oFxNcL += '';
oFyNcL += '';
}
if (oFxNcL < oFyNcL) return -1;
if (oFxNcL > oFyNcL) return 1;
}
return 0;
}
And applied using:
// Natural Sort (disabled because it is super freaking slow.... need xsl transform sorting instead)
var sortedSet = $(data).children("documents").children("document").sort(function(a, b) {
return naturalSort($(a).children('index').text(), $(b).children('index').text());
});
This works fine on our other, smaller XML file, but for the giant 500kb-ish file Safari (v4) just plainly hangs for up to a few minutes to sort this through, while Firefox (latest) takes around 10 second to process (still not good, but at least sane).
I also found this other smaller/lighter script called Alphanum:
function alphanum(a, b) {
function chunkify(t) {
var tz = [], x = 0, y = -1, n = 0, i, j;
while (i = (j = t.charAt(x++)).charCodeAt(0)) {
var m = (i == 46 || (i >=48 && i <= 57));
if (m !== n) {
tz[++y] = "";
n = m;
}
tz[y] += j;
}
return tz;
}
var aa = chunkify(a);
var bb = chunkify(b);
for (x = 0; aa[x] && bb[x]; x++) {
if (aa[x] !== bb[x]) {
var c = Number(aa[x]), d = Number(bb[x]);
if (c == aa[x] && d == bb[x]) {
return c - d;
} else return (aa[x] > bb[x]) ? 1 : -1;
}
}
return aa.length - bb.length;
}
This runs faster for Safari, but is still locks up the browser for a minute or so.
I did some research, and it seems that a few people recommended using XSL to sort the XML entries, which apparently is much faster due to it's being built into the browser instead of running on top of JavaScript.
There's apparently several different implementations, with Sarissa getting getting mentioned several times, the sourceforge page seems to indicate that the last update occured back in 2011-06-22.
There's also other choices such as xslt.js
My question is:
- Is XSL the best sorting option for this particular problem?
- If so how can I use XSL to do Natural Sort? (url to resources?)
- If yes to both questions, which library should I use for the best compatibility and speed?
- If XSL is not the best choice, then which one is?
Thanks for looking at my problem.