Assuming you are only using BMP characters:
/* Compute length of UTF-8 serialization of string s. */
function utf8Length(s)
{
var l = 0;
for (var i = 0; i < s.length; i++) {
var c = s.charCodeAt(i);
if (c <= 0x007f) l += 1;
else if (c <= 0x07ff) l += 2;
else if (c >= 0xd800 && c <= 0xdfff) l += 2; // surrogates
else l += 3;
}
return l;
}
If you get out of BMP (i.e. use characters above 0xffff) things get more complicated, as they will be seen in JavaScript as surrogate pairs that you will have to identify...
Update: I updated the code so that it works with all of Unicode,
not only BMP.
However, this code now relies on a strong assumption: that the given
string is correct UTF-16. It works by counting two bytes for every
surrogate found in the string. The truth is that a surrogate pair
is encoded as 4 bytes in UTF-8, and no surrogate should ever be found
outside a pair.