How long should a message header/prefix be?

Question

I've worked with a few protocols, and written my own. I have written some message formats with only 1 char to identify the message, and some with 4 chars. I don't feel that I'm experienced enough to tell which is better, so I'm looking for an answer which describes in which scenario one might be better than the other.

For performance, you would imagine that sending 2 bytes (A%1i) is faster than sending 5 bytes (ABCD%1i). However, I have noticed that when writing the protocol with the 1 byte prefix, if you have a bug which causes your code to not read enough data from the socket, you might get garbage data comming into your system.

So is the purpose of a 4 byte prefix just to provide a guarentee that your message is clean? Is it worth it for the performance you sacrafice? Do you really sacrafice any performance at all? Maybe it's better to have 2 or 3 byte prefix?

I'm not sure if this question should be specific to TCP, or whether it applies to all transport protocols. Advice on this would be interesting.

Update: For interest, I will mention that Synergy uses 4-byte message prefixes, so for a mouse move delta the header is the same size as the actual data. Some have suggested just having a 1 or 2 byte prefix to improve efficiency. I wonder what drawbacks this would have?

Update: Also, I wonder if only the handshake really matters, if you're worried about garbage data. Synergy has a long handshake (a few bytes), so are the 4-byte message prefixes needed? I made a protocol recently that has only a 1 byte handshake, and that turned out to be a bad idea, since incompatible protocols were spamming the system with bad data (off the back of this, I might reccomend at least having a long handshake).

Performance implications depend on the size of your "payload" package. If the message is only 1 byte long (only contains binary state of something) then your extra 3 bytes will have considerable implications, if the message is 5KB (or more) extra 3 bytes will have negligible effect. Rule of thumb: I would say about consider anything bigger than 5% of payload size. AFAIK the size of prefix does NOT affect the error handling and behaviour, but I may be wrong here. Further more if you intend to send single messages over the network the extra 3 bytes may have no effect at all. — Germann Arlington, Jul 20 '12 at 13:14
For what Synergy does, low latency is probably much much more important than low bandwidth use. My gut feeling is that this is a pattern - if you need to send many tiny updates, you're unlikely to send enough data in total to congest the connection. — millimoose, Jul 22 '12 at 17:14

score 1 · Accepted Answer · edited May 23 '17 at 11:55

The purpose of the header is to make it easier to solve the frame synchronization problem ( byte aligning in serial communication ). To synchronize, the receiver looks for anything in the data stream that "looks like" a start-of-message header. If you have lots of different kinds of valid start-of-message headers, and all of them are 1 byte long, then you will inevitably get a lot of "false frame synchronizations" -- garbage from something that "looks like" a start-of-message header, but isn't. It would be better to pick some other header that makes it "unlikely" that anything in the serial data stream "looks like" a valid start-of-message header.

It is inevitable that you will get garbage data coming into your system, no matter how you design the packet header. Whatever you use to handle these other problems (such as occasional bit errors in the middle of the message) should also be adequate to handle the occasional "false frame synchronization" garbage. In some systems, any bad data is quickly overwritten by fresh new good data, and if you blink you might never see the bad data. Other systems need at least some sort of error detection in the footer to reject the bad data. Yet other systems need to not only detect such errors, but somehow keep re-sending that message -- until both sides are convinced that an error-free version of that message has been successfully received.

As Oleksi implied, in some systems the latency is not significantly different between sending a single binary bit (100 ms) and sending 10 bytes (102.4 ms). So the advantages of using a tiny header (2.4% less latency!) may not be worth it compared to the advantages of using a more verbose header (easier debugging; easier to make backward-compatible and forward-compatible; easier to test the effect of minor changes "in isolation" without upgrading both sides in lockstep to the new protocol which is completely incompatible with the old protocol).

Perhaps you could get the best of both worlds by (a) keeping the verbose, easy-to-debug headers on messages that are so rarely used that the effect of tiny headers is too small to measure (which I suspect is nearly all messages), and (b) introducing a "tiny header" format for any kind of message where the effect of tiny headers is "noticeably better" or at least at least measurable. It looks like the Synergy protocol is flexible enough to add such a "tiny header" format in a way that is easily distinguishable from the other kinds of message headers.

I use Synergy between my laptop and a few desktop machines. I am glad someone is trying to make it even better.

score 0 · Answer 2 · answered Jul 22 '12 at 17:09

0

The performance will depend on the content of the message you are sending. If your content is several kilobytes, it doesn't really matter how many bytes your header is. For now, I would choose the scheme that's easiest to work with, because the performance difference between sending one byte, or four bytes is going to be negligible compared to the actual data that you're sending.

answered Jul 22 '12 at 17:09

Oleksi

12,947
4
56
80

As I mentioned, Synergy's message data is in some cases smaller than the header (e.g. delta mouse moves, key presses). But as millimoose points out, changing from 4 byte to 1 byte would probably be neglagable in terms of latency, which I agree with (though I'd like to see it proven). What is your take on this? – Nick Bolton Jul 22 '12 at 17:24
2

If the message is so small, it's probably super fast to send anyway, regardless of your header size. If you're sending thousands of messages, and you start noticing a performance hit, then you can consider optimizing the header size. For now, I'd do what it makes your life easier, knowing that it _probably_ won't be a performance bottleneck. – Oleksi Jul 22 '12 at 17:27

How long should a message header/prefix be?

2 Answers2