I have a XML file which contains some invalid characters (character not supported in XML 1.0). I want to parse the file but I am getting exception that XML contains invalid characters. Is there any possible way to parse the XML with invalid characters. Or skip the node attribute which contains invalid character.
Asked
Active
Viewed 2,007 times
1
-
Would it work if you forced the parser in XML 1.1 mode, or changed the XML prolog to declare your file as 1.1 XML (which is kind of a hack, but one of the easiest to test for) ? – GPI Mar 06 '17 at 16:57
2 Answers
0
A possible workaround would be loading it in as a string and replacing the invalid character with a valid character or tag so you know it was there. Then parse normally.

Good Game Industries
- 113
- 9
-
Thanks for response..The size of XML is too large so it is a complex procedure for me.I have a line number where I am getting the invalid characters, so is there any way to modify content of particular line. – Shrikant Mar 06 '17 at 15:52
0
So you mean there are characters &, < , > (" or ' in attributes) around? You can write your own InputStream decorator and convert those "bad" characters into escaped ones. Your InputStream takes this invalid data and returns valid data for the next processing stage:
InputStream yourFancyIllegalCharConverter = new YourFancyIllegalCharConverter( realInputStream );
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader( yourFancyIllegalCharConverter );

Christian Ullenboom
- 1,388
- 3
- 24
- 20