1

I'm makinga Java server for the last version of the draft. I managed to make the connection, and that's working great.

The problem is that I don't understand how the data is encoded, I've been trying to find some example of how it has to be done but I couldn't find anything. so I'm trying to do it by myself but need some help.

Here is an image of the frame.

But I don't understand where the payload begins. What is:

Extended payload length (16/63) (if payload len==126/127)

That's the place where my payload should be?

Can someone give some help because as you may see, I'm completely lost...

pimvdb
  • 151,816
  • 78
  • 307
  • 352
Andres
  • 11,439
  • 12
  • 48
  • 87

1 Answers1

2

The problem is that the length does not always fit in 7 bits (you can only express the numbers 0 to 127 with 7 bits), and in that case either the following 2 or 8 bytes will be used to make the length fit:

  • 126 means the following 2 bytes are used for the length
  • 127 means the following 8 bytes are used for the length

So the payload starts at either index 2, 4 or 10, if not encoded. When encoded, it starts at either 6, 8 or 14 (because there are 4 mask bytes).

I previously posted some pseudocode about decoding the payload data.


To actually get the length as a "real number" (instead of separate bytes), you can use bitwise shift operators as follows (in case there are two bytes for the length):

var length = (bytes[2] << 8) | (bytes[3] << 0);

This will calculate it like this:

Suppose:

  • bytes[2] is 01101001 (105 in base 10)
  • bytes[3] is 10100101 (165 in base 10)

Then << will be doing:

01101001 00000000   // moved 8 places to the left, filled with zeroes
         10100101   // moved 0 places (nothing really happening, you can eliminate '<< 0')

| is basically adding them:

01101001 00000000
         10100101
-----------------  |
01101001 10100101      (in base 10 that's 27045)

So if you have the bytes 105 and 165, then they represent a length of 27045.

Community
  • 1
  • 1
pimvdb
  • 151,816
  • 78
  • 307
  • 352
  • Great link, an other question, how do I know how much bytes to read?? Because I don't know the size of the "frame" of the websocket, and in java I'm reading using "dataInputStream.read(bytes,totalToRead, leftToRead)... any idea??? totally lost... thanks a gain! – Andres Oct 26 '11 at 22:45
  • @Andres: I don't know Java, but as I said the frame includes the length. In the three cases I posted there you can get the length with either the second byte (last 7 bits), the third and fourth byte or the third to the tenth byte. – pimvdb Oct 27 '11 at 07:56
  • @Andres: If you don't know how to interpret bytes and convert them to a length, please have a look at my edit. – pimvdb Oct 27 '11 at 08:12
  • I've been reading the draft of hybi and it says that data from server to the browser doesn't need to be masked, so I just can send a string in UTF-8 and the browser should get it? Because, I read this and he does some encoding... http://stackoverflow.com/questions/7087522/php-websocket-server-hybi10 – Andres Oct 27 '11 at 12:16
  • 1
    @Andres: No I'm not doing that myself. It does not mean it's forbidden though. But if you send the header data and the length with the data appended, it should work fine. – pimvdb Oct 27 '11 at 14:26
  • last question, is the payload in UTF-8 or is just encoded as you explain in the other post? I just decoded it and I should have the plain text right?? – Andres Oct 28 '11 at 02:52
  • 1
    @Andres: It depends; the hybi draft supports both UTF-8 data and binary data. What data is used depends on the opcode (last 4 bits of the first byte). If you send in JavaScript like `ws.send(string)`, you'll obtain a text frame (UTF-8) and the opcode is 1. If you send it like `ws.send(arraybuffer)`, you'll obtain raw binary data and the opcode is 2. – pimvdb Oct 28 '11 at 08:57