For any non-trivial application (I.E. the application must receive and handle different kinds of messages with different lengths), the solution to your particular problem isn't necessarily just a programming solution - it's a convention, I.E. a protocol.
In order to determine how many bytes you should pass to your read
call, you should establish a common prefix, or header, that your application receives. That way, when a socket first has reads available, you can make decisions about what to expect.
A binary example might look like this:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <arpa/inet.h>
enum MessageType {
MESSAGE_FOO,
MESSAGE_BAR,
};
struct MessageHeader {
uint32_t type;
uint32_t length;
};
/**
* Attempts to continue reading a `socket` until `bytes` number
* of bytes are read. Returns truthy on success, falsy on failure.
*
* Similar to @grieve's ReadXBytes.
*/
int readExpected(int socket, void *destination, size_t bytes)
{
/*
* Can't increment a void pointer, as incrementing
* is done by the width of the pointed-to type -
* and void doesn't have a width
*
* You can in GCC but it's not very portable
*/
char *destinationBytes = destination;
while (bytes) {
ssize_t readBytes = read(socket, destinationBytes, bytes);
if (readBytes < 1)
return 0;
destinationBytes += readBytes;
bytes -= readBytes;
}
return 1;
}
int main(int argc, char **argv)
{
int selectedFd;
// use `select` or `poll` to wait on sockets
// received a message on `selectedFd`, start reading
char *fooMessage;
struct {
uint32_t a;
uint32_t b;
} barMessage;
struct MessageHeader received;
if (!readExpected (selectedFd, &received, sizeof(received))) {
// handle error
}
// handle network/host byte order differences maybe
received.type = ntohl(received.type);
received.length = ntohl(received.length);
switch (received.type) {
case MESSAGE_FOO:
// "foo" sends an ASCII string or something
fooMessage = calloc(received.length + 1, 1);
if (readExpected (selectedFd, fooMessage, received.length))
puts(fooMessage);
free(fooMessage);
break;
case MESSAGE_BAR:
// "bar" sends a message of a fixed size
if (readExpected (selectedFd, &barMessage, sizeof(barMessage))) {
barMessage.a = ntohl(barMessage.a);
barMessage.b = ntohl(barMessage.b);
printf("a + b = %d\n", barMessage.a + barMessage.b);
}
break;
default:
puts("Malformed type received");
// kick the client out probably
}
}
You can likely already see one disadvantage of using a binary format - for each attribute greater than a char
you read, you will have to ensure its byte order is correct using the ntohl
or ntohs
functions.
An alternative is to use byte-encoded messages, such as simple ASCII or UTF-8 strings, which avoid byte-order issues entirely but require extra effort to parse and validate.
There are two final considerations for network data in C.
The first is that some C types do not have fixed widths. For example, the humble int
is defined as the word size of the processor, so 32 bit processors will produce 32 bit int
s, while 64 bit processors will produces 64 bit int
s. Good, portable code should have network data use fixed-width types, like those defined in stdint.h
.
The second is struct padding. A struct with different-widthed members will add data in between some members to maintain memory alignment, making the struct faster to use in the program but sometimes producing confusing results.
#include <stdio.h>
#include <stdint.h>
int main()
{
struct A {
char a;
uint32_t b;
} A;
printf("sizeof(A): %ld\n", sizeof(A));
}
In this example, its actual width won't be 1 char
+ 4 uint32_t
= 5 bytes, it'll be 8:
mharrison@mharrison-KATANA:~$ gcc -o padding padding.c
mharrison@mharrison-KATANA:~$ ./padding
sizeof(A): 8
This is because 3 bytes are added after char a
to make sure uint32_t b
is memory-aligned.
So if you write
a struct A
, then attempt to read a char
and a uint32_t
on the other side, you'll get char a
, and a uint32_t where the first three bytes are garbage and the last byte is the first byte of the actual integer you wrote.
Either document your data format explicitly as C struct types or, better yet, document any padding bytes they might contain.