8

I'm using javax.xml.stream.XMLStreamReader to parse XML documents. Unfortunately, some of the documents I'm parsing use non-IANA encoding names, like "macroman" and "ms-ansi". For example:

<?xml version="1.0" encoding="macroman"?>
<foo />

This causes the parse to blow up with an exception:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,42]
Message: Invalid encoding name "macroman".

Is there any way to provide a custom encoding handler to my XMLStreamReader so that I can augment it with support for the encodings I need??

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • I'm assuming you don't have the ability to alter the stream so that it doesn't contain the encoding line? XMLStreamReader has its limitations, and this is one of them. – Dylan Mar 14 '19 at 16:45
  • Its unfortunate, but you may better be served by choosing a different XML library. – Dylan Mar 14 '19 at 16:45
  • @Dylan I'm not producing these documents, just consuming them, so I have no control over the encoding line unfortunately. Are there other XML libraries that are more flexible? – Laurence Gonsalves Mar 15 '19 at 22:40

1 Answers1

0

You could wrap the input stream with a transformer that replaces the non-standard charset with the equivalent charset that XMLStreamReader does understand.

See Filter (search and replace) array of bytes in an InputStream

Rich
  • 15,048
  • 2
  • 66
  • 119