I am using a sax parser to parse the xml file which has an encoding of utf-8y. How can I specify that in the sax parser or the input source ? I always get a parse exception
Asked
Active
Viewed 556 times
2 Answers
1
I presume you're reading the file via an InputStream? The parser should be able to determine the file type from the XML header. If you read the file into a string, and then parse that the it tends to go pear-shaped

Anya Shenanigans
- 91,618
- 3
- 107
- 122
-
1Sorry, is it that there is a BOM marker at the header of the file? if that's the case, then there are several workarounds documented e.g. http://webcache.googleusercontent.com/search?q=cache:5JOKO1VNetQJ:bugs.sun.com/bugdatabase/view_bug.do%3Fbug_id%3D6206835+saxparser+utf-8+BOM&cd=1&hl=en&ct=clnk&source=www.google.com If the XML header for the file is mis-stating the content of the file, then you could use a BufferedInputStream and rewrite the content while passing it into the parser – Anya Shenanigans Jun 29 '11 at 21:20
0
Just to make sure: is that 'Y' something that is included in 'encoding' value of XML document? Then I am not surprised you get an error -- there is no such encoding. I assume this is an error in whatever produced the document and should be fixed.
But on your side, you have two main options:
- Construct an InputStreamReader yourself from InputStream, passing "UTF-8" as encoding
- Modify input document before parsing to remove that 'y' from there
First approach is simple, and most parsers should be ok with it. Second option can be used if first doesn't work.

StaxMan
- 113,358
- 34
- 211
- 239