1

I have the following XML:

<?xml version='1.0' ?>
<foo>A&gt;B</foo>

and just want to get the node value of start tag as A&gt;B, if we use getNodeValue it will convert it to A>B which is not needed.

Hence I decided to use the Transformer

        Document doc = getParsedDoc(abovexml);
        TransformerFactory tranFact = TransformerFactory.newInstance();
        Transformer transfor = tranFact.newTransformer();
        transfor.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        Source src = new DOMSource(node);
        StringWriter buffer = new StringWriter();
        Result dest = new StreamResult(buffer);
        transfor.transform(src, dest);
        String result = buffer.toString();

But this gives the following output as part of result as <foo>A&gt;B</foo>

It will be helpful if somebody could clarify, if there is an approach with which we can get A&gt;B without doing string manipulation from the above output (<foo>A&gt;B</foo>)

Vineet Reynolds
  • 76,006
  • 17
  • 150
  • 174
Babu
  • 337
  • 1
  • 3
  • 5

2 Answers2

0

Since getNodeValue() is automatically decoding the the String.
You can use StringEscapeUtils from Apache Commons Lang to encode it again.

http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
http://commons.apache.org/lang/

String nodeValue = StringEscapeUtils.escapeHtml(getNodeValue());

That would encode it into the format you want it to be in. It is not very performance friendly because you are applying encode for every node value.

kensen john
  • 5,439
  • 5
  • 28
  • 36
  • Actually, `getNodeValue()` is not decoding the string. The string is decoded when it is parsed. In the information model, which is presumable how it's stored in memory, the string **is** `A>B`, not `A>B`. The latter is just a serialization form. `getNodeValue()` returns the actual string, `A>B`. But the solution given here is correct: if you want an escaped form (`A>B`), you need to ask for it, e.g. using an escape utility. – LarsH Feb 15 '12 at 15:56
0

Actually getNodeValue() is not "converting" the string. When the XML is parsed from a file, or produced by a transformation, the resulting information model is that the string is A>B, not A&gt;B. The latter is just a serialization form.

Another legitimate serialization form is A>B (because right angle bracket does not need to be escaped in most cases). However, there may be compatibility reasons for wanting to produce A&gt;B, especially if your output is intended to be HTML (though you didn't mention that).

If you have a good reason for escaping the >, then I agree with @kensen john's answer for getting that done.

Community
  • 1
  • 1
LarsH
  • 27,481
  • 8
  • 94
  • 152