6

To parse my XML with JAXB I have already generated the required POJO's and successfully able to parse the XML. But whenever my xml contains '&' '<>' signs it's failing. As per the rule this needs to be changed to '&amp' but the 3PP generating the XML does not follow the rule. Now how can I parse this xml with '& <>' signs.

Note - For Marshalling I found many answers but not working for unmarshalling.

Environment - Java 8

XML Example :

<Customer Info> This is & Customer Info <Customer Info>

Any help would be helpful

Malolan
  • 53
  • 5
Souvik
  • 1,219
  • 3
  • 16
  • 38
  • Are you sure it is not related with fact that you use white space in root element name? Try to use `Customer_Info` instead of `Customer Info`. – Michał Ziober Apr 15 '19 at 19:16
  • THis is a dummy xml and my problem is with the text 'This is & Customer Info' where I have & sign – Souvik Apr 15 '19 at 20:25
  • I created simple app which serislises and deserialises XML with these chars without any problem. Could you create simple app which reproduces the error? – Michał Ziober Apr 15 '19 at 20:32
  • A sample program is public static void main(String args[]) throws Exception { JAXBContext jaxbContext = JAXBContext.newInstance(Document.class); Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller(); Document document = (Document) jaxbUnmarshaller .unmarshal(new File("CUSTINFO.xml")); } This is a demo only. If you can paste your program that would be helpful. Also can you check that if you have any soecial configuration in your XSD – Souvik Apr 15 '19 at 21:21
  • I see it now. I thought you serialise and deserialise `&`. But you only need to deserialise `&` from `XML`. – Michał Ziober Apr 15 '19 at 21:31
  • Yes. XML I am receiving from 3PP applications. Now I have to parse it in my system only. – Souvik Apr 16 '19 at 04:41
  • Any update on this question – Souvik Apr 17 '19 at 04:46
  • [This answer](https://stackoverflow.com/a/29374882/2834978) could help, seems that it tries to parse incrementally and catching exceptions. – LMC Apr 22 '19 at 03:02
  • This - ` This is & Customer Info ` is not XML. Tags cannot contain spaces, the closing tag is missing the `/`, and `&`s need to be presented as the predefined entity `&`. Tell your 3pp to start sending you well-formed XML... – jon hanson Apr 24 '19 at 07:31

3 Answers3

1

JSoup is designed to cope with parsing fairly rough and ready HTML, so works with more generous parsing rules than the normal XML API (e.g. the built-in version of Xerces that comes with the JRE).

It can output XML to a W3C DOM suitable for use in JAXB:

    org.jsoup.nodes.Document soupDoc = Jsoup.parse(unescapedXml, "",
            Parser.xmlParser());
    org.w3c.dom.Document w3cDoc = new W3CDom().fromJsoup(soupDoc);

    JAXBContext jaxbContext = JAXBContext.newInstance(CustInfo.class);
    Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
    CustInfo custInfo = (CustInfo) jaxbUnmarshaller.unmarshal(w3cDoc);

(Annoyingly both JSoup and W3C use Document ).

This seems to cope well with any of '&' '<' or '>' in an XML attribute or body text, though there are bound to be combinations where the lack of escape chars is just too much.

df778899
  • 10,703
  • 1
  • 24
  • 36
0

There are a number of Open Source frameworks that help, Jackson is one of the more popular ones. As a developer, unless I am creating a new third party serializer and deserializer, I would leave the task of parsing to the utility.

Check out XMLMapper class in Jackson to serialize and deserialize. See methods writeValue() and readValue() to write to XML and read from XML respectively.

Rao Pathangi
  • 504
  • 3
  • 11
0

You will need to pass the XML String through StringEscapeUtils.escapeXml();

From the documentation,

Supports only the five basic XML entities (gt, lt, quot, amp, apos). Does not support DTDs or external entities.

Note that unicode characters greater than 0x7f are currently escaped to their numerical \u equivalent. This may change in future releases.

Mohamed Anees A
  • 4,119
  • 1
  • 22
  • 35