This morning I was under the impression .toUpperCase and .toLowerCase only translate the basic Latin chars a-z and A-Z and leave the more "exotic" characters alone but of course, on closer inspection, that's really not the case...
console.log( "fi".toLowerCase() ); // this yields a single char
>fi
console.log( "fi".toUpperCase() ); // this yields two chars
>FI
After reading the specs it seems javascript is applying the "Unicode Default Case Conversion algorithm" and it's a whole lot more complicated. The Unicode specs says the various mappings between upper, lower and title case are defined by the two files UnicodeData.txt and SpecialCasing.txt and I don't doubt that, but trying to make sense of them enough to answer my question has brought me to the brink of a brain haemorrhage. Before I go any further I thought I would ask if anyone more familiar with Unicode knows...
edit: Thanks for your suggestions so far but THIS is my question...
Are there any unicode upper to lower case conversions that might split a character into several chars?
And if so, is there a canonical javascript way to do a casing conversion that doesn't split any characters? I want a case conversion method to make a single char substring search case insensitive. Consequently it doesn't matter if the result is a string of mixed case as long as it is consistent i.e. a single character is always translated to a single character, be it upper or lower.