2

My experience with the Linux kernel is very minimal. I have just started to play around with it recently.

I have been trying to trace the earliest time of arrival of a packet for my research purpose. I can do that at the device driver level by modifying the device driver and recording the timestamps in the device driver's interrupt handler function. I am sorry that this post might be a little bit longer.

For example, I have modified this function (https://elixir.bootlin.com/linux/v4.7/source/drivers/net/ethernet/intel/i40e/i40e_main.c#L3232) to trace the timestamp of the invocation of this function.

diving further deep and following the stack trace of this invocation, we will find a stack trace as below:

  • i40e_msix_clean_rings() - i40e driver's i40e_main.c in the link provided above
  • __handle_irq_event_percpu() - kernel/irq/handle.c
  • handle_irq_event_percpu() - kernel/irq/handle.c
  • handle_irq_event() - kernel/irq/handle.c
  • handle_edge_irq() - kernel/irq/chip.c
  • handle_irq() - arch/x86/kernel/irq_64.c
  • do_IRQ() - arch/x86/kernel/irq.c
  • common_interrupt() - not pretty sure, but implementation should be similar to early_idt_handler_common() in arch/x86/kernel/head_32.s

I am trying to trace the timestamp of the arrival of my packet in the function do_IRQ() function (boldface in stack trace above). For reference, the do_IRQ function looks like:

__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
    struct pt_regs *old_regs = set_irq_regs(regs);
    struct irq_desc * desc;
    /* high bit used in ret_from_ code  */
    unsigned vector = ~regs->orig_ax;
    **int int_number;**

    entering_irq();

    /* entering_irq() tells RCU that we're not quiescent.  Check it. */
    RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");

    desc = __this_cpu_read(vector_irq[vector]);
    **int_number = desc->irq_data.irq;**

    **printk(KERN_INFO "IRQ Number=%d; Vector=%d \n", int_number, vector);**

    if (!handle_irq(desc, regs)) {
        ack_APIC_irq();

        if (desc != VECTOR_RETRIGGERED) {
            pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
                         __func__, smp_processor_id(),
                         vector);
        } else {
            __this_cpu_write(vector_irq[vector], VECTOR_UNUSED);
        }
    }

    exiting_irq();

    set_irq_regs(old_regs);
    return 1;
}

To make my intentions a bit clearer, I have indicated my changes in this function wrapped in " ** ".

For example, on my test machine, my NIC is bound to IRQ Number 19. The int_number variable represents that number. Thus, this lets me track the IRQ for a particular IRQ number.

It might not sound relevant for a single queue NIC Adapter, but it will be applicable for a multi-queue adapter, as I can direct my packets to a fixed queue with flow director and each queue is bound to a specific IRQ number. Thus, this will help me trace my packets easily.

My approaches:

  1. adding manual implementations inside this function; which I do not think is the right approach.
  2. using kprobes. But does it let me filter my traces based on the contents inside of the variables or the arguments?
  3. using jprobe. I guess, with this approach, we will be able to play with the arguments. I was able to handle the event though. I simply followed the examples of jprobe. (https://stuff.mit.edu/afs/sipb/contrib/linux/samples/kprobes/jprobe_example.c) and several more.
  4. I came across other tools as well, while going through the approaches mentioned above. like perf, perf-tools, eBPF. But, I am not sure which would be the best approach for me.

Just to clarify my final task: I am trying to capture the timestamps of the earliest arrival of my packets like:

t1
t2
t3
t4 

I would appreciate any kind of inputs on this.

Thank you !

cooshal
  • 758
  • 8
  • 21
  • First, did you make sure your hardware NIC doesn't support timestamping? Hardware timestamping is likely to be much more accurate than any software timestamping. Second, do you only want the time of arrivals of new packets (as in the example list at the end of your post) or do you want the time of arrival with the corresponding packet? If it's the second case, how should the packet be identified? – pchaigno Apr 05 '18 at 12:06
  • thank you for getting back. yes, that's going to be a tricky part. otherwise, I could have just used perf or something like that to trace the irq events. without core kernel patching, my current approach in the device driver is: 1. record the time *t1* at irq handler function(i40e_msix_clean_rings) 2. while harvesting this packet(inside i40e_poll function), if it is UDP port XX, then confirm t1 as the arrival time of the packet; else ignore it. I am not sure, how can I trace an IRQ event and confirm that IRQ event was indeed for my desired packet. – cooshal Apr 05 '18 at 12:17
  • I found this when I did: ethtool -T off (HWTSTAMP_TX_OFF) on (HWTSTAMP_TX_ON) – cooshal Apr 05 '18 at 12:18
  • Are you saying you currently identify packets based solely on the UDP port? I'm not sure how to interpret the output of `ethtool -T` to be honest. – pchaigno Apr 05 '18 at 12:40
  • Hi ! just checked it with Intel's forums and docs. My NIC doesn't support hw timestamping. Coming back to the first question about identifying the type of packet. yes! I am sending these specific UDP packets on port XX from Host A to B. This is just a proof of concept to analyze the jitter in the arrival time of synchronous packets. I am relying on an optimistic approach here (that all Rx packets on this specific ports are my desired packets). Later I can conform to this with some extra verifications (like contents, headers, etc.) – cooshal Apr 05 '18 at 12:49
  • I'm not sure we're talking about the same thing here. I guess it'd be easier to ask what you want to do with the timestamps once you have them? – pchaigno Apr 05 '18 at 13:02
  • well, first based on the timestamps tracked at each level (like do_IRQ; device irq handler; packet fetching, etc.), we can have a maximum jitter(max time difference between consecutive packets) at each level and also a better picture about the causes of such jitters (as I am talking about synchronous communication; it is very important), and may be optimize it further, for example with a poll mode driver. – cooshal Apr 05 '18 at 13:17
  • Ok, so you don't want to associate an arrival time to each packet, you just want to have the list of arrival times to be able to compute jitter and other metrics, right? – pchaigno Apr 05 '18 at 16:25
  • well, that would be my first step in this approach. But, eventually, I have to make sure that the received IRQ was actually my the packet I had wanted/expected. My approach would have been: 1. if IRQ Number is 19, record the Timestamp in a global variable. 2. IRQ is processed as normal, passed on to the device driver. TS is recorded here as well in the same global variable 3. if it is MY packet, confirm that this is MY packet. But, problem is, how can I associate that IRQ with my packet? I came across kprobe and jprobe and thought it might help me out. Not sure if I am doing right. – cooshal Apr 05 '18 at 17:05
  • How do you know a packet is your packet? – pchaigno Apr 05 '18 at 17:09
  • I will know this from the device driver's function which fetches the packet; i.e. static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)[https://elixir.bootlin.com/linux/v4.15.12/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L2065] . Inside this function, I can fetch the packet and I can get information about my packet through packet headers. – cooshal Apr 05 '18 at 17:34
  • Ok... but which fields of the packet header are going to identify that packet uniquely?? – pchaigno Apr 05 '18 at 21:07
  • in my case; its the Protocol and Port Number i.e. UDP and the Port Number (in my case 319), and with ethtool flow-director, I have configured those packets to be directed to a fixed Rx queue (e.g. Rx. Queue 5) – cooshal Apr 05 '18 at 21:49

0 Answers0