A string in JavaScript is a counted sequence of UTF-16 code units. There is an implicit contract that the code units represent Unicode codepoints. Even so, it is possible to represent any sequence of UTF-16 code units—even unpaired surrogates.
I find String.fromCharCode(0xd801)
returns the replacement character, which seems quite reasonable (rather than undefined
). Any text function might do that but, for efficiency reasons, I'm sure that many text manipulations would just pass invalid sequences through unless the manipulation required interpreting them as codepoints.
The easiest way to create such a string is with a string literal. For example, "\uD83D \uDEB2"
or "\uD83D"
or "\uDEB2"
instead of the valid "\uD83D\uDEB2"
.
"\uD83D \uDEB2".replace(" ","")
actually does return "\uD83D\uDEB2"
(""
) but I don't think you should count on anything good coming from a string that isn't a valid UTF-16 encoding of Unicode codepoints.