in a string
there is no "invalid" value for a char
. There are "invalid Unicode code points", but a string
can contain them without problems, because string
is a "stupid container" (but note that some string
methods are "more intelligent" and don't like very much invalid code points... Normally they skip them/replace them with some substitution character)
Now... "visualizers" (modules/functions/methods that have to "show" a string) often have limitations and can't show all the characters (even perfectly valid ones)... A classsical example is Zalgo and Zalgo. This is your problem, but this is another problem :-)
To make an example, in Windows there are at least 4 "official" API to write text to the screen: GDI, GDI+, Uniscribe, DirectWrite. And many programs (games primarily) then use the FreeType library as an alternative... Each one of these libraries is compatible with some parts of Unicode.
I'll add that the character that creates problems to you (0x85) is called NEL or Next Line. It is a control character, so not something that should be "shown" and it has a complex and funny story, that could explain why it is sometimes shown as ellipsis:
the code for NEL has been used as the ellipsis ('…') character in Windows-1252.
For instance:
YAML[8] no longer recognizes them as special, in order to be compatible with JSON.
ECMAScript[9] accepts LS and PS as line breaks, but considers U+0085 (NEL) white space, not a line break.
Microsoft Windows 2000 does not treat any of NEL, LS or PS as line-break in the default text editor Notepad
On Linux, a popular editor, gedit, treats LS and PS as newlines but does not for NEL.