I'm looking for an efficient way to take a JavaScript string and return all of the scripts which occur in that string.
Full UTF-16 including the "astral" plane / non-BMP characters which require surrogate pairs must be correctly handled. This is possibly the main problem since JavaScript is not UTF-16 aware.
It only has to deal with codepoints so no fancy awareness of complex scripts or grapheme clusters is necessary. (This will be obvious to some of you anyway.)
Example:
stringToIso15924("παν語");
would return something like:
[ "Grek", "Hani" ]
I'm using node.js and some Unicode libraries such as XRegExp and unorm already so I don't mind adding other libraries that might already handle or ease such a feature.
I'm not aware of a JavaScript library that can look up character properties such as script codes, so this is probably the second part of the problem.
The third part of the problem is just to avoid inefficiencies.