Is it possible to peek at a datagram packet in higher-level languages?

Question

I am currently trying to port a C program that deals with datagram (UDP) packets to some higher-level language. As the packets can be of variable size, they start with an integer stating their size. In c, I call recv with the MSG_PEEK flag to first receive this value only, then allocate a fitting buffer and read the rest of the packet. The code (simplified) goes like this:

// Simplified message format.
struct message {
    int length;
    char[] text;
}
struct message *m = malloc (sizeof(int));

// Read out in just length.
recv (sock, m, sizeof(int), MSG_WAITALL | MSG_PEEK);
int txtlen = ntohl (m->length) * sizeof(char);
int msglen = sizeof(int) + txtlen;

// Read complete packet.
m = realloc (m, msglen);
read (sock, m, msglen);
m->text[txtlen] = '\0';

// Show result.
printf("%s\n", &m->text);

I want to avoid the seemingly common practice to allocate an enormous buffer and hope that no bigger packets will arrive. So is something like peeking at the datagram or determining its complete length beforehand possible in higher-level languages like python or java?

UDP packets can't be "huge" (see e.g. http://stackoverflow.com/questions/1098897/what-is-the-largest-safe-udp-packet-size-on-the-internet). Are you sure you should be worrying about this? — Mat, Oct 28 '11 at 11:55
Unless it's an embedded system with *very* liimited resources, definitely not.. — Karoly Horvath, Oct 28 '11 at 11:59
I think it's not a question about the *language*, but a question of the *abstractions* being used. For example you can do that in Perl, considering Perl is a "higher-level" language, but still there may exist some modules using sockets that won't allow you to "peek" at the next message. — U. Windl, Jun 08 '22 at 12:27

Adam Liss · Answer 1 · 2011-10-29T15:49:31.393

I want to avoid the seemingly common practice to allocate an enormous buffer and hope that the packet wont get any bigger.

Not sure what you mean by this. A UDP packet arrives all at once, so the initial integer tells you exactly how big your buffer should be; it won't "grow" after it arrives.

Since you're appending a null character, you need to account for that in your length calculation:

int msglen = sizeof(int) + txtlen + 1;

Be careful when you use realloc():

m = realloc (m, msglen);

If the realloc fails it will set m to null. That means you'll lose your only reference to the memory that was originally allocated to it, so you'll never be able to free() it. Try something like this:

void *tmp = realloc(m, msglen)
if (tmp == null) {
  // handle the error
}
m = tmp;

And when you print the data, m->text evaluates to the address of the first character, so you can use

printf("%s\n", m->text);

Alternatively, you could define your structure with a fixed size, as

struct message {
  int length;
  char *text;
}

Then you can use malloc() to allocate (only) your text buffer:

struct message m;
recv(sock, &m.length, sizeof(int), MSG_WAITALL | MSG_PEEK);
m.text = malloc(m.length + 1); // +1 for the null that you'll append
read(sock, m.text, m.length);
m.text(m.length) = '\0';

printf("%s\n", m.text);
free(m.text);

Good luck with your project--network programming is always a learning experience!

I refrained from checking the realloc in the code example to keep it simple. As stated above, two read calls will read two packets and that is why I preferred a struct. — XZS, Oct 28 '11 at 13:51

score 0 · Answer 2 · answered Oct 28 '11 at 11:56

0

Why not do this?

message = (struct message *)malloc(sizeof(struct message));
read(sock, &message->length, sizeof(int);
message->length = ntohl(message->length);
message->text = (char *)malloc(message->length + 1);
read(sock, message->text, message->length);
message->text[message->length] = 0;

answered Oct 28 '11 at 11:56

Ed Heal

59,252
17
87
127

Unfortunately this will not work as it receives two datagrams. With the first read, the whole datagram is discarded, even if it contains more than just the int. – XZS Oct 28 '11 at 13:52

score 0 · Answer 3 · answered Oct 28 '11 at 12:44

0

UDP datagrams are limited to 64K, then ethernet frames are 1500 bytes (unless your network is using jumbo frames, which could be up to 9000 bytes). Protocol designers usually try to avoid IP fragmentation, so most likely your incoming packets are small, i.e. less then 1500 bytes.

I would just start with static buffer of 1472 (1500 ethernet frame length - 20 bytes of IP header - 8 bytes of UDP header). If you have to deal with some arbitrary protocols - bump that up to 64K. If you cannot afford that - collect actual sizes with MSG_PEEK, find some convenient average, and setup a fall-back plan with malloc(3).

answered Oct 28 '11 at 12:44

Nikolai Fetissov

82,306
11
110
171

I expect that the 64K limit follows from the 16 bit long UDP header field that states the packet length. So this should really care for a save buffer. But even 64K are in fact a waste considering that the memory needed could be previously known by reading out just that int. – XZS Oct 28 '11 at 13:53
1

Yes, 64K is definitely a waste, but in this day and age it's really not a problem, unless you are in some embedded/resource-constrained environment. – Nikolai Fetissov Oct 29 '11 at 08:26

Is it possible to peek at a datagram packet in higher-level languages?

3 Answers3