1

I fully understand the error "An invalid XML character (Unicode: 0x3) was found"

Caused by: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x3) was found in the element content of the document. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2923) ~[na:1.8.0_111]

But I cannot believe my eyes that it is marshalled with this character in the first place.

I've marshalled the class that contained portions of .gz file in it, and the marshalling was successful. When I tried to unmarshal it, it gave me this error.

The marshaller and unmarshaller I used were from /com/sun/xml/internal/bind/v2/runtime/ -- rt.jar.

Marshaller marshaller = context.createMarshaller();
marshaller.marshal(object, stringWriter);
Unmarshaller unmarshaller = context.createUnmarshaller();
unmarshaller.unmarshal(new StringReader(stringWriter.toString()));

This is obvious reflexivity issue and I don't know how to deal with it.

Anyone who had the same issue, please advise how to overcome it, hopefully, without marshaller change.

P.S. From my understanding, marshallers should always be reflexive and do not marshal things that it cannot unmarshal. It's a shame that rt.jar one is not.

3 Answers3

0

Third thing I forgot about it...

There are characters invalid to be in the XML as string and must be escaped as:

<   &lt;
>   &gt;
&   &amp;
 for attribute values only:
"   &quot;
'   &apos;

If any of your string can have them they must be either escaped or included in CDATA if they are not in attributes.

see here: Invalid Characters in XML

Community
  • 1
  • 1
Vadim
  • 4,027
  • 2
  • 10
  • 26
-1

Why don't you try removing the invalid charaters.

Discussion on this was done in this thread.

check this thread

Hope this helps!!

Community
  • 1
  • 1
karthik
  • 17
  • 2
  • Thanks for your reply, but it's not that simple -- I need to fail if I cannot unmarshal, not simply remove bad char. I consider to unmarshal the marshalled entities, and, if no exceptions are thrown, proceed. Also, maybe there is some way to put JAXB implementation on its tracks -- as I said, it simply _must_ be reflexive. – Alexey Mironchenko Dec 08 '16 at 12:35
-1

Why do it with marshallin/unmarshaling technique? You have a Java object at first. How did you get it? and why it has invalid for XML character, but good for Java? Based on requirement you have three options:

  1. If data in Java object is correct and must be passed inside XML you have to encode them with Base64. Binary data cannot be presented in the XML.

  2. If it is bad data and you have to handle it as error - do it before marshalling

  3. If you do not need that invalid bytes - remove them as suggested.

From other hand: check your marshaller default encoding. When you create a marshaller there is a property "jaxb.encoding". Does it match what unmarshaller uses? i.e. for "utf-8"

marshaller.setProperty("jaxb.encoding","utf-8")
Vadim
  • 4,027
  • 2
  • 10
  • 26
  • yes, the encoding is uniform. I just want to include any string in my string fields, so no validation from business logic side. Encoding with Base64 is not good. We're moving to json, anyway. – Alexey Mironchenko Dec 08 '16 at 14:38
  • There is no other way for binary either XML or JSON. Both are String based protocols. How do you plan to deal with binary data in JSON? – Vadim Dec 08 '16 at 15:04
  • If you have Strings where did you get that binary bytes? 0x03 is not a character. – Vadim Dec 08 '16 at 15:12
  • What do you have in first line of marshalled XML ? – Vadim Dec 08 '16 at 15:14