I have XML that look as follows:
<StartTag>
<MyValueTag>And the value itself contains a < bracket that makes the XML invalid</MyValueTag>
</StartTag>
The XML contains a '<' character that makes the XML invalid.
Now the easiest way is to fix the source of the XML but unfortunately I don't have control over the XML creation. It has messages like “ The value is < than 10” suppose to be “less than”.
Is there anyway how I can check the XML for things like this and escape those characters it?
I tried Looking at this post where the guy indicated that we should use JTidy. But when I tried it it doesn't remove the <:
Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setWraplen(Integer.MAX_VALUE);
tidy.setPrintBodyOnly(true);
tidy.setXmlOut(true);
tidy.setSmartIndent(true);
ByteArrayInputStream inputStream = new ByteArrayInputStream(data.getBytes("UTF-8"));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
tidy.parseDOM(inputStream, outputStream);