1

My application supports REST API using Wink and a JAXB provider built in to the JDK (1.6). Sometimes I receive PUT requests that contain control characters.

As far as my application is concerned, the control characters constitute a valid and meaningful input. However, the application throws the notorious exception saying that it cannot digest these characters:

Message: An invalid XML character (Unicode: 0x13) was found in the element content of the document.]
at org.apache.wink.common.internal.providers.entity.xml.JAXBXmlProvider.readFrom(JAXBXmlProvider.java:107)
at org.apache.wink.server.internal.registry.ServerInjectableFactory$EntityParam.getValue(ServerInjectableFactory.java:190)
at org.apache.wink.common.internal.registry.InjectableFactory.instantiate(InjectableFactory.java:67)
at org.apache.wink.server.internal.handlers.CreateInvocationParametersHandler.handleRequest

There is probably no way to tell the JAXB provider to ignore these characters (since at some point I will have to parse the Xml, and illegal is illegal..). How can I make this work? Is there a way to instruct the Rest client to escape these characters before sending them?

Vitaliy
  • 8,044
  • 7
  • 38
  • 66
  • 1
    Control characters are [not allowed in XML](http://www.w3.org/TR/2008/REC-xml-20081126/#charsets) so your data is not XML. The trouble with trying to define _it is like X except for Y_ is that it would be difficult to define an API where Y can be whatever the individual thinks it should be. It is unlikely JAXB will accommodate you. You could use a filter to strip the illegal characters if that is acceptable; otherwise you will have to encode or escape the data in legal character data (e.g. with Base64.) – McDowell Jan 07 '13 at 15:05
  • @McDowell please post as answer and I will accept. Thanks. – Vitaliy Jan 07 '13 at 15:07
  • @McDowell will it work in JSon? – Vitaliy Jan 07 '13 at 15:08
  • Alas, no - control characters cannot be encoded in JSON strings as per [the specification](http://www.json.org/) – McDowell Jan 07 '13 at 15:16

2 Answers2

2

Control characters are not allowed in XML so your data is not XML. The trouble with it is like X except for Y is that it would be difficult to define an API where Y can be whatever the individual thinks it should be. It is unlikely JAXB will accommodate you. You could use a filter to strip the illegal characters if that is acceptable; otherwise you will have to encode or escape the data in legal character data (e.g. with Base64.)

McDowell
  • 107,573
  • 31
  • 204
  • 267
  • I will write a filter. Given the encoding and the ServletInputStream, is there an easy way to determine whether the char I just read is a control character? – Vitaliy Jan 07 '13 at 15:28
  • You will need to determine the document encoding from the [declaration](http://www.w3.org/TR/2004/REC-xml-20040204/#NT-XMLDecl) (see also [appendix F](http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing)) and decode it to determine if the [code point](http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#codePointAt%28char%5B%5D%2C%20int%29) is supported. You can test against the [character ranges](http://www.w3.org/TR/2008/REC-xml-20081126/#charsets) in the XML spec. Then provide your own stream to the forward chain. – McDowell Jan 07 '13 at 15:46
  • Say I determined that the Encoding is UTF8. I see that the Character class contains a method isISOControl, that has a version that inputs a Unicode codepoint. Will it work or do I need another conversion? – Vitaliy Jan 07 '13 at 20:14
  • 1
    `Character.isISOControl('\n');` evaluates to true, so this isn't exactly what you want. This question has been asked before - see [removing invalid XML characters from a string in java](http://stackoverflow.com/questions/4237625) for example. The problem of filtering invalid character ranges from XML-like data in a servlet filter probably deserves its own question (though don't be surprised if the answerers question why you would send invalid data to such a service instead of using a more appropriate format.) – McDowell Jan 07 '13 at 22:39
2

The characters in question are not "unprintable XML characters". They are unprintable non-XML characters.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164