1

I have a XML file which contains some invalid characters (character not supported in XML 1.0). I want to parse the file but I am getting exception that XML contains invalid characters. Is there any possible way to parse the XML with invalid characters. Or skip the node attribute which contains invalid character.

Shrikant
  • 21
  • 4
  • Would it work if you forced the parser in XML 1.1 mode, or changed the XML prolog to declare your file as 1.1 XML (which is kind of a hack, but one of the easiest to test for) ? – GPI Mar 06 '17 at 16:57

2 Answers2

0

A possible workaround would be loading it in as a string and replacing the invalid character with a valid character or tag so you know it was there. Then parse normally.

  • Thanks for response..The size of XML is too large so it is a complex procedure for me.I have a line number where I am getting the invalid characters, so is there any way to modify content of particular line. – Shrikant Mar 06 '17 at 15:52
0

So you mean there are characters &, < , > (" or ' in attributes) around? You can write your own InputStream decorator and convert those "bad" characters into escaped ones. Your InputStream takes this invalid data and returns valid data for the next processing stage:

InputStream yourFancyIllegalCharConverter = new YourFancyIllegalCharConverter( realInputStream );
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader( yourFancyIllegalCharConverter );
Christian Ullenboom
  • 1,388
  • 3
  • 24
  • 20