3

I want to insert a timestamp into the packets I receive in my XDP program. The only way I know how to get a timestamp is by calling bpf_ktime_get_ns.

But what would be the user-space equivalent function which creates comparable timestamps? As far as I know, ktime_get_ns returns the time since system start (in nanoseconds). There is

$ uptime
 11:45:35 up 2 days,  3:15,  3 users,  load average: 0.19, 0.29, 0.27

but this only returns the time since system start in seconds. So no precise measurement possible here (microsecond-level would be nice).

Edit: It was purely my fault. @Qeole and @tuilagio are completely right. I made a mistake in pointer arithmetic in my user space code where I was obtaining the pointer of the timestamp.

binaryBigInt
  • 1,526
  • 2
  • 18
  • 44
  • `clock_gettime()`? – Qeole Apr 01 '20 at 12:41
  • @Qeole Which one of these? `CLOCK_REALTIME`, `CLOCK_MONOTONIC`, `CLOCK_PROCESS_CPUTIME_ID`, `CLOCK_THREAD_CPUTIME_ID`. According to this description: https://linux.die.net/man/3/clock_gettime they are not the same as https://sourceware.org/systemtap/tapsets/API-ktime-get-ns.html – binaryBigInt Apr 01 '20 at 12:52
  • 1
    If you find nothing better, I suppose you always have the possibility to test-run (`BPF_PROG_TEST_RUN`) a BPF program to retrieve the timestamp from kernel :). – Qeole Apr 02 '20 at 13:54

2 Answers2

4

You can use clock_gettime() :

static unsigned long get_nsecs(void)
{
    struct timespec ts;

    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec * 1000000000UL + ts.tv_nsec;
}

and call it with unsigned long now = get_nsecs(); or uint64_t now = get_nsecs();

This return timestamp in nsec resolution.

Source: https://github.com/torvalds/linux/blob/master/samples/bpf/xdpsock_user.c#L114

Que0Le
  • 84
  • 5
  • Did you read my comment? `bpf_ktime_get_ns` and `clock_gettime` are not based on the same time source P.S: I know it sounds weird, like why would they provide a bpf helper function with a timestamp that is not comparable to the timestamps we can get in user space? Just doesn't make sense at all – binaryBigInt Apr 02 '20 at 08:13
  • @binaryBigInt Does make sense though. The clock is monotonic and its primary use case is to compute durations (as described in [commit](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d9847d310ab4) [logs](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=17ca8cbf49be3a)). Because it happens on the kernel side, it reuses the clock system that was already there. – Qeole Apr 02 '20 at 13:49
  • @Qeole So you are saying that `bpf_ktime_get_ns` and `clock_gettime` are comparable? The reason why I am asking is because I get completely unrealistic results such as `65500` (which is close to `UINT16_MAX` (the datatype I use for µs-Latency)). I don't think it takes 65ms for a packet to travel from XDP-Kernel to userspace... – binaryBigInt Apr 02 '20 at 16:37
  • @binaryBigInt No I'm saying that running `bpf_ktime_get_ns()` from a program hooked on e.g. TC ingress and `bpf_ktime_get_ns()` again from another hooked on e.g. TC egress is comparable. Or for tracing, running `bpf_ktime_get_ns()` both at entry (kprobe) and exit (kretprobe), or in different functions on a given code path, is comparable. I've never tried to compare with user timestamps, so no idea for your use case. Using a BPF program to get the timestamp (just call helper and return value) is a hack but could work though, it shouldn't cost much more than a syscall. – Qeole Apr 02 '20 at 16:58
  • 1
    @binaryBigInt It muss have something to do with your code. I also measure the time since packets arrived to the point they entered userspace and get value < 10 usec (microsecond). – Que0Le Apr 03 '20 at 07:58
  • 1
    The measurement was done using `bpf_ktime_get_ns` in kernel space code and `clock_gettime` in user space code. About the time sources I am not sure, the results however seem reasonable. Sorry I overlooked your post. – Que0Le Apr 03 '20 at 08:09
  • @tuilagio Thanks for your reply, did you use `CLOCK_MONOTONIC`? How did you insert the timestamp into the packet in your XDP-program? – binaryBigInt Apr 03 '20 at 09:29
  • The funny thing is even though I sleep for 1000µs to gather packets and then process them in a batch, I get latencies of 100µs – binaryBigInt Apr 03 '20 at 09:39
  • 1
    @binaryBigInt The time stamp returned by `bpf_ktime_get_ns` gets written in packet when packet arrived kernel, and in userspace the user code call `get_nsecs()` when packet arrived. Just use the code I provided and see. – Que0Le Apr 03 '20 at 15:29
3

This is probably not the canonical way to do this, but at least it's fun: We can retrieve the kernel timestamp... from BPF itself!

BPF subsystem has a “test-run” feature that allows to test some types for programs with user-provided data, the run being triggered with a bpf() system call. Here is a sample application doing just so:

  1. It loads a BPF program (XDP, but type does not matter much) and gets a FD.
  2. It reuses the FD for triggering a “test-run” of that BPF program.
  3. When it runs, the program calls bpf_ktime_get_ns(), copies the value to the data output buffer (data_out), and we just have to read that to get the timestamp.
#define _GNU_SOURCE
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <unistd.h>

#include <linux/bpf.h>

int main(__attribute__((unused))int argc,
         __attribute__((unused))char **argv)
{
    union bpf_attr load_attr = { }, run_attr = { };
    const struct bpf_insn insns[] = {
        /* w0 = 1                | r0 = XDP_DROP */
        { .code = 0xb4, .src_reg = 0, .dst_reg = 0, .off = 0, .imm = 1, },
        /* r2 = *(u32 *)(r1 + 4) | r2 = ctx->data_end */
        { .code = 0x61, .src_reg = 1, .dst_reg = 2, .off = 4, .imm = 0, },
        /* r6 = *(u32 *)(r1 + 0) | r6 = ctx->data */
        { .code = 0x61, .src_reg = 1, .dst_reg = 6, .off = 0, .imm = 0, },
        /* r1 = r6               | r1 = ctx->data */
        { .code = 0xbf, .src_reg = 6, .dst_reg = 1, .off = 0, .imm = 0, },
        /* r1 += 8               | r1 += sizeof(uint64_t) */
        { .code = 0x07, .src_reg = 0, .dst_reg = 1, .off = 0, .imm = 8, },
        /* if r1 > r2 goto +3    | if (data + 8 > data_end) return */
        { .code = 0x2d, .src_reg = 2, .dst_reg = 1, .off = 3, .imm = 0, },
        /* call bpf_ktime_get_ns() */
        { .code = 0x85, .src_reg = 0, .dst_reg = 0, .off = 0, .imm = BPF_FUNC_ktime_get_ns, },
        /* *(u64 *)(r6 + 0) = r0 | *(ctx->data) = bpf_ktime_get_ns() */
        { .code = 0x7b, .src_reg = 0, .dst_reg = 6, .off = 0, .imm = 0, },
        /* w0 = 2                | r0 = XDP_PASS */
        { .code = 0xb4, .src_reg = 0, .dst_reg = 0, .off = 0, .imm = 2, },
        /* exit                  | return r0 */
        { .code = 0x95, .src_reg = 0, .dst_reg = 0, .off = 0, .imm = 0, },
    };
    const char license[] = "GPL";   /* required for bpf_ktime_get_ns() */
    /*
     * Data buffers data_in/data_out must be at least the minimal size for
     * an Ethernet frame: 14 header bytes
     */
    const uint8_t data_out[14];
    const uint8_t data_in[14];
    int fd, res;

    /* Load program */

    load_attr.prog_type = BPF_PROG_TYPE_XDP;
    load_attr.insn_cnt = sizeof(insns) / sizeof(insns[0]);
    load_attr.insns = (uint64_t)insns;
    load_attr.license = (uint64_t)license;

    fd = syscall(__NR_bpf, BPF_PROG_LOAD, &load_attr, sizeof(load_attr));
    if (fd < 0) {
        fprintf(stderr, "failed to load BPF program: %s\n",
                strerror(errno));
        return EXIT_FAILURE;
    }

    /* Run program */

    run_attr.test.prog_fd = fd;
    run_attr.test.data_size_in = sizeof(data_in);
    run_attr.test.data_size_out = sizeof(data_out);
    run_attr.test.data_in = (uint64_t)data_in;
    run_attr.test.data_out = (uint64_t)data_out;

    res = syscall(__NR_bpf, BPF_PROG_TEST_RUN, &run_attr, sizeof(run_attr));
    if (res) {
        fprintf(stderr, "failed to run BPF program: %s\n",
                strerror(errno));
        close(fd);
        return EXIT_FAILURE;
    }

    /* Extract result */

    fprintf(stdout, "%lu\n", (uint64_t)run_attr.test.data_out);

    close(fd);
    return EXIT_SUCCESS;
}

Note that we could also extract data from the return value of the program (run_attr.test.retval), but this is a 32-bit integer so you would not get the full timestamp. This could be used to retrieve e.g. only the number of the seconds for that timestamp, with a right shift r0 >>= 32, to avoid doing data/data_end length check and copying to data_out. Not that it should change much in performance.

Running the whole application (load + run) will obviously take longer than subsequent runs, because of the verification step done in the kernel when loading the program.

Addendum: The BPF program was generated from the following code:

#include <linux/bpf.h>

static unsigned long long (*bpf_ktime_get_ns)(void) =
    (void *)BPF_FUNC_ktime_get_ns;

int xdp(struct xdp_md *ctx)
{
    void *data_end = (void *) (long) ctx->data_end;
    void *data = (void *) (long) ctx->data;

    if (data + sizeof(unsigned long long) > data_end)
        return XDP_DROP;

    *(unsigned long long *)data = bpf_ktime_get_ns();
    return XDP_PASS;
}
Qeole
  • 8,284
  • 1
  • 24
  • 52