1

I'm trying to parse xml files provided to me. I'm parsing the files using javax.xml DocumentBuilder. The files can contain tags that have quoted inner xml which I do not want parsed.

Shortened example:

<Property Name="Value" PreFormatted="1">"<?xml version='1.0' encoding='UTF-16'?>"</Property>

When I run the parser as so:

Document document = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder()
    .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

I receive the following error:

[Fatal Error] :1:106: The processing instruction target matching "[xX][mM][lL]" is
                      not allowed.

I understand that this error occurs when you have more than one xml declaration in the code, but I am unable to figure out how to prevent the parser from attempting to parse the quoted xml.

How can I prevent quoted xml from being parsed?

Sean Bright
  • 118,630
  • 17
  • 138
  • 146
  • 2
    That is illegal XML. You can't parse it. – SLaks Nov 02 '16 at 19:52
  • 1
    You will have to go back to whomever gave you that XML and tell them that it's wrong. The quotation marks should be done with entities (`"`), not with `"` itself.. – Joe C Nov 02 '16 at 20:59
  • Yea, that's what I figured out. Apache commons has a method StringEscapeUtils.escapeXml10 which I used to escape the special characters... Apparently there is an issue escaping single quotes with ' however. I had to manually use StringUtils.replace. Thank you. – Christian Beasley Nov 03 '16 at 20:56

0 Answers0