Packets do not have an inherent length, and even getting this information from C would actually be relatively difficult. This is because at the user layer, socket data is not represented in terms of packets. (UDP is represented in terms of complete payloads, but it's definitely possible that an individual UDP payload received in userland could still represent multiple packets.) Therefore, the question is not really proper, and what you really should be asking is how to determine how many bytes are available to read on a socket.
So why doesn't the code you pasted tell you that? You should really understand what this code does because it is interesting. However, because it's sort of tangential to the information you're probably actually interested in, I'll leave that for later.
Unfortunately, PHP does not provide a means for you to call ioctl(2)
with the raw socket FD. This would allow you to do something like ioctl(sock, FIONREAD, &n)
in PHP-land to determine the number of bytes available to read. (Actually, apparently you can do this if you used fopen
or fsockopen
, but I guess you didn't.) Alas, this won't work.
Fortunately, there are two options for you:
Use non-blocking sockets. You can call socket_set_nonblock on your socket stream. Once you do this, any call to socket_read
, socket_write
, etc will operate in non-blocking mode. This means that if you do e.g. $data = socket_read($socket, 1024)
, and fewer than 1024 bytes are available, the available bytes are returned. (N.B. the number of bytes returned could be 0 if no data is available.)
Use socket_select to do the same for a range of sockets. This function notifies you as to what sockets have readable data / are in a writable state / have errors that need handling.
Whichever version of this you use, the general way of handling cases where a socket does not receive enough data is to implement a timeout. socket_select
provides a native interface for this; manually doing this with non-blocking sockets requires you to remember how long you've waited and implement sleeps between calls to socket_read
. If you do not receive enough data within some period of time (say 10 seconds), close the socket and forget about it.
If you receive more data than you're expecting, that's an error, and you close the socket and forget about it.
The protocol you're handling matters as well. Because you haven't said what protocol you're handling, I can't help with pointers to tell you how much data to expect. But maybe you're implementing your own protocol for fun.
In this case, you need to determine the methodology for encoding how much data is on the line. Because the method you've said you're using in your question does some binary tricks, I'll do the same. You'll probably want to pack a 32-bit value into the start of the string. When you receive a connection, you wait for the first 4 bytes to come in. Once you've read those 4 bytes, you can unpack that data to determine how much you need to read.
<?php
$payload = "Have a nice day!\n";
$len = strlen($payload) + 1; // + 1 for terminating NUL byte
$packet = pack("Na", $len, $payload);
socket_send($sock, $packet, $len + 4); // + 4 is length
...
Then, in the server
<?php
$r = socket_read($sock, 4);
$la = unpack("N", $r);
// Because we don't know how much to read until we get the first 4 bytes.
// Obviously this is a DoS vector for someone to hold the connection open,
// so you will likely want to use socket_select to get that first bit of
// data. That's an exercise for you.
socket_set_nonblock($sock);
$len = $la[1];
$time = 0;
$payload = "";
while ($len > 0 && $time < 10) {
$data = socket_read($sock, $la[1]);
$tlen = strlen($data);
$payload .= $data;
$len -= $tlen;
if ($len == 0) {
break;
}
sleep(1); // Feel free to usleep.
$time++;
}
N.B. I didn't test this code past making sure I encoded the packed/unpacked data properly, so I'm not sure you can use it verbatim. Treat it as architectural pseudocode.
Other protocols have other means of length encoding. HTTP for instance uses Content-Length in the common case.
Your Example Code
In a previous edit, I just dismissed this as looking at the first byte to get the length. I checked back in to this question because I saw an upvote and because some of my wording about packets was bothering me. And I also realized I was very wrong about my conclusion from my quick glance at that code.
The code reads individual bytes off of a socket to attempt to get a variable length remaining payload. (I wonder if this is code from a repo for ZeroMQ or for Apple push notifications or something.) Anyway, it looks like the code does some weird stuff, but what's actually going on?
private function get_packet_length($socket) {
$a = 0;
$b = 0;
while(true) {
/* Read next single byte off of the socket */
$c = socket_read($socket, 1);
if (!$c) {
return 0;
}
/* Use integer value of the byte instead of the character value */
$c = ord($c);
/*
* Get the first 7 bits of $c. Since $c represents an integer value
* of a single byte, its maximum range is [0, 2^8). When we use only
* 7 bits, the range is constrained to [0, 2^7), or 0 - 127. This
* means we are using the 8th bit as a flag of some kind -- more on
* this momentarily.
*
* The next bit executed is ($b++ * 7), since multiplication has
* higher precedence than a left shift. On the first iteration, we
* shift by 0 bits, the second we shift 7 bits, the third we shift
* 14 bits, etc. This means that we're incrementally building an
* integer value byte by byte. We'll take a look at how this works
* out on real byte sequences in a sec.
*/
$a |= ($c & 0x7F) << $b++ * 7;
/*
* If we've tried to handle more than 5 bytes, this encoding doesn't
* make sense.
*/
if ($b > 5) {
return false;
}
/*
* If the top bit was 1, then we still have more bytes to read to
* represent this number. Otherwise, we are done reading this
* number.
*/
if (($c & 0x80) != 128) {
break;
}
}
return $a;
}
So let's consider what this means with a few different byte streams:
$stream = "\x01"; /* This is pretty obviously 1 */
$stream = "\x81\x80\x80\x00";
/* This is also 1, can you figure out why? */
$stream = "\xff\x01"; /* You might think it 256, but this is 255 */
$stream = "\x80\x82"; /* This is 256. */
$stream = "\xff\xff\x01"; /* 32767 */
$stream = "\x80\x80\x02"; /* 32768 */
$stream = "\x0c\x00\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21"
/*
* A static value 13, a length terminator, and 13 bytes that happen
* to represent "Hello, world!".
* This shows how such an encoding would be used in practice
*/
If you're familiar with byte order, you might notice that these values are transmitted in little-endian encoding. This would normally be weird (network byte order is big-endian), but we're not actually sending whole integer values: we're sending variable length byte streams. The encoding of each byte helps us figure out what the length is. However, without knowing what protocol this code implements, this might actually be a bug that prevents this code from working portably across machines with different endianness.
It's important to note that this is part of a bytestream protocol and is not a standard way to get a length of anything. In fact, many binary protocols are not defined to be bytestream protocols. This is because bytestream protocols often don't define endianness at all (since it's not necessary for the stream). Therefore, if you ran this code on a PPC or some ARM processors, it wouldn't work. We therefore say this code is not portable.
When dealing with bytestreams or sending raw binary data over a network, always make sure to define the endianness of the data. If you do not, your protocol will create non-portable implementations. See also this great post by Rob Pike for more information on the subject of byte order, and why any time it's a problem, you're either confused or someone did something wrong (like define a bytestream protocol without fully defining the encoding of numbers).