0

I want to parse an XML whose tag contains an & for example: <xml><OC&C>12.4</OC&C></xml>. I tried to escape & to &amp; but that didn't fix the issue for tag name (it fixes it for values only), currently my code is throwing an exception, see complete function below.

public static void main(String[] args) throws Exception
{
  String xmlString        = "<xml><OC&C>12.4</OC&C></xml>";
  xmlString = xmlString.replaceAll("&", "&amp;");
  String path             = "xml";
  InputSource inputSource = new InputSource(new StringReader(xmlString));
  try
  {
    Document xmlDocument            = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(inputSource);
    XPath xPath                     = XPathFactory.newInstance().newXPath();
    XPathExpression xPathExpression = xPath.compile(path);

    System.out.println("Compiled Successfully.");
  }
  catch (SAXException e)
  {
    System.out.println("Error while retrieving node Path:" + path + " from " + xmlString + ". Returning null");
  }
}
CAMOBAP
  • 5,523
  • 8
  • 58
  • 93
Wael
  • 1,533
  • 4
  • 20
  • 35

3 Answers3

2

Hmmm... I don't think that it is a legal XML name. I'd think about using a regex to replace OC&C to something legal first, and then parse it.

Arawak
  • 21
  • 2
  • About processing XML with regexes, see this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Cephalopod Jan 29 '13 at 15:16
  • @Arian I don't think Arawak is suggesting he *parse* the document with regex. He is is suggesting a search and replace, which regex is well suited for. Then parse with the xml parser. – iagreen Jan 29 '13 at 15:23
  • 1
    You cannot safely search&replace an XML document without parsing it. – Cephalopod Jan 29 '13 at 15:24
1

It's not "an XML". It's a non-XML. XML doesn't allow ampersands in names. Therefore, you can't parse it successfully using an XML parser.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
0

xml could not be name of any XML element. So, your XML fragment could never be parsed anyway. Then you could try something like that.

<name><![CDATA[<OC&C>12.4</OC&C>]]></name>
Cylian
  • 10,970
  • 4
  • 42
  • 55