-3

I'm using java jaxb to unmarshal xml request via socket, before the actual xml

<?xml version="1.0"....

I receive these bytes

00 00 01 F9 EF BB BF

What are they?, size of the xml?, session id?...

The sender is using msxml4 to execute request's to my service.

Futhermore, I can see that the sender expect this type of header (it trunks the first 7 bytes if I send directly the xml response).

So when I have understood what these bytes are, is there any "normal" any method using jaxb that can be used to add this header or do I need to do it manually.

Thanks for any reply

Petter Friberg
  • 21,252
  • 9
  • 60
  • 109
  • 4
    Well `EF BB BF` is the UTF-8 BOM ... – Alex K. Sep 25 '15 at 16:05
  • 3
    When used, a BOM should be the first thing in a UTF-8 stream, so I presume that the first four bytes are not intended to be part of the message content. They could be a 32-bit message length, either in characters or in bytes, so that overall you have a string object serialized in (length, content) form. – John Bollinger Sep 25 '15 at 16:13
  • Rather than try to adapt to some random messaging protocol, it would be best to *choose* a protocol. (Message-length, message-content) is in fact a perfectly serviceable protocol, but it is concerning that the client ignores bytes 4 - 6 of the response when they are not `EF BB BF`. You need to know *all* these details, whether by specifying them yourself or by choosing a protocol with complete documentation to which you can refer. – John Bollinger Sep 25 '15 at 16:25
  • 1
    Thanks, for the BOM indication, the bytes before is the lenght of the file (including the 3 BOM bytes). For now I calculate and send these bytes before unmarshalling with jaxb, if you have any "cleaner" solution the question remains open. If you like some credit for the BOM comment please post as answer and i will try to give you some credit – Petter Friberg Sep 25 '15 at 16:56
  • You can add your solution as an answer (and accept it) to help anyone else who encounters this – Alex K. Sep 25 '15 at 19:49
  • Based on what JohnBollinger said about the first few bytes being a possible length, `00 00 01 F9` is 63745 for a 32-bit integer in network-byte order, and `01 F9` is also 63745 for a 16-bit integer. Is the XML data actually that size (in bytes or characters)? It is customary practice for network-based protocols to transmit multi-byte integers in network-byte order. – Remy Lebeau Sep 25 '15 at 21:46
  • 00 00 01 F9 = 256 + 249 = 505, the length of xml (chars) = 502, I have tried with another xml response (different length) and had the same result (00 00 01 E8 =488 xml length 485) . This is why for now I'am assuming that it is the length, it seems to have sense as well (since you can predict your byte buffer on incoming stream). – Petter Friberg Sep 28 '15 at 12:18

1 Answers1

1

This is a BOM header.

The first 4 bytes indicate file size 00 00 01 F9 = 0 + 0 + 256 + 249 = 505 (including the 3 bytes indicating UTF-8 (EF BB BF). Hence the xml length will be 502.

How to handle this stream with Jaxb view:

Byte order mark screws up file reading in Java

why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

JAXB unmarshaller BOM handlle

However, I have prefeered to handle the stream byte by byte reading it into a StringBuffer (since I need it also in string format for logging)

My reading byte to byte solution is implemented to wait for the '<' char, hence first char in xml message.

To add the BOM heading before sending response I have used a similar method:

import java.nio.ByteBuffer;
public byte[] getBOMMessage(int xmlLenght) {
    byte[] arr = new byte[7];
    ByteBuffer buf = ByteBuffer.wrap(arr);
    buf.putInt(xmlLenght+3);
    arr[4]=(byte)0xef;
    arr[5]=(byte)0xbb;
    arr[6]=(byte)0xbf;
    return arr;
}
Community
  • 1
  • 1
Petter Friberg
  • 21,252
  • 9
  • 60
  • 109