6

I am implementing DPLPMTUD and I want to suppress the Linux kernel from returning -1 with errno = EMSGSIZE when I send UDP packet longer than the local interface's MTU. I want to avoid the pain of dealing with error handling when several datagrams are sent out (especially when using sendmmsg(2)), each perhaps belonging to a different connection. I'd rather have the kernel drop the packet and let the application DPLPMTUD logic figure out the MTU.

ip(7) has this to say:

              It is possible to implement RFC 4821 MTU probing with SOCK_DGRAM
              or SOCK_RAW sockets by  setting  a  value  of  IP_PMTUDISC_PROBE
              (available  since Linux 2.6.22).  This is also particularly use‐
              ful for diagnostic tools such as tracepath(8) that wish  to  de‐
              liberately send probe packets larger than the observed Path MTU.

Yet setting this option does not produce the desired effect. Here is the code to illustrate the problem:

/* emsgsize.c: test whether IP_PMTUDISC_PROBE suppresses EMSGSIZE
 *
 * Usage: emsgsize packet_size
 */

#include <arpa/inet.h>
#include <errno.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>

#define CHECK(w_, s_) do { if ((s_) < 0) { perror(w_); return 1; }} while (0)

/* Payload */
static unsigned char data[64 * 1024];

int
main (int argc, char **argv)
{
    int fd, on, s, size;
    struct sockaddr_in si;
    ssize_t sent;

    if (argc != 2)
    {
        fprintf(stderr, "usage: emsgsize size\n");
        return 1;
    }
    size = atoi(argv[1]);

    memset(&si, 0, sizeof(si));
    si.sin_family = AF_INET;

    fd = socket(si.sin_family, SOCK_DGRAM, 0);
    CHECK("socket", fd);

    s = bind(fd, (struct sockaddr *) &si, sizeof(si));
    CHECK("bind", s);

    /* This is supposed to suppress sendmsg(2) returning -1 with
     * errno = EMSGSIZE, see ip(7):
     *
     "        It is possible to implement RFC 4821 MTU probing with SOCK_DGRAM
     "        or SOCK_RAW sockets by  setting  a  value  of  IP_PMTUDISC_PROBE
     "        (available  since Linux 2.6.22).  This is also particularly use-
     "        ful for diagnostic tools such as tracepath(8) that wish  to  de-
     "        liberately send probe packets larger than the observed Path MTU.
     */
    on = IP_PMTUDISC_PROBE;
    s = setsockopt(fd, IPPROTO_IP, IP_MTU_DISCOVER, &on, sizeof(on));
    CHECK("setsockopt", s);

    memset(&si, 0, sizeof(si));
    si.sin_family = AF_INET;
    si.sin_port = htons(12345); /* Destination does not matter */
    s = inet_pton(AF_INET, "127.0.0.1", &si.sin_addr);
    CHECK("inet_pton", s);
    sent = sendto(fd, data, (size_t) size, 0, (struct sockaddr *) &si,
                                                            sizeof(si));
    CHECK("sendto", sent);

    return 0;
}

When I send packets larger than the MTU, sendto() above returns -1 and errno is set to EMSGSIZE -- exactly what I want to avoid.

Is there a way to do what I want?

Community
  • 1
  • 1
Dmitri
  • 479
  • 3
  • 10
  • 1
    `IP_PMTUDISC_PROBE` sets do-not-fragment flag (which causes `EMSGSIZE` error to be returned for too long messages). Use `IP_PMTUDISC_WANT` instead: that allows fragmenting the datagram, but does path MTU discovery too (and sets DF flag for datagrams that are not too long). I recommend you look at the official [`man 7 ip`](https://man7.org/linux/man-pages/man7/ip.7.html) page at man7.org for the most accurate, up-to date documentation; they're well described there. – None Jul 10 '20 at 23:50
  • 1
    I don't want the kernel to do the MTU discovery, though, I want to do it myself. – Dmitri Jul 11 '20 at 13:09
  • Of course it is my job. The whole idea behind DPLPMTUD is that you discover the PMTU at the PL -- the Packetization Layer. – Dmitri Jul 11 '20 at 14:21
  • @Dmitri: Then use `IP_PMTUDISC_DONT`. No `EMSGSIZE` errors, and messages exceeding MTU are just dropped. – None Jul 11 '20 at 15:25
  • @None, I want the DF bit set, though. It is required by the [QUIC Internet Draft](https://tools.ietf.org/html/draft-ietf-quic-transport-29#section-14): "UDP datagrams MUST NOT be fragmented at the IP layer. In IPv4 [IPv4], the DF bit MUST be set to prevent fragmentation on the path." – Dmitri Jul 12 '20 at 13:49
  • 1
    [Here is commit](https://github.com/torvalds/linux/commit/628a5c561890a) where `IP_PMTUDISC_PROBE` was added. Changes in `net/ipv4/ip_output.c` indicate that you can send packets without fragmentation larger than **MTU** to destination, but still not larger than device **MTU**. – SergA Apr 12 '22 at 23:03

0 Answers0