My problem is as follows. I am reading in an XML-file whose text nodes partially contain the UTF-8 version of opening and closing double quotes. The text is extracted, shortened to 3999 bytes and put into a new XML-Format, which is then saved as a file.
While both signs are displayed correctly by Notepad++ in the input file, the output file contains invalid utf-8 characters, not even Notepad++ is able to display.
The openeing double quotes are printed correctly, but the closing ones are disfigured.
Using a Hex-Editor, I found ot that the code units are somehow changed from
E2 80 9D
in the input file to
E2 80 3F
in the output file. I am using the sax-parser for the xml-parsing.
Are there any known bugs that could cause such a behaviour?