2

Is there some way to determine if, between two programatically defined points (e.g,. a start() and stop() method), the current thread has been interrupted for any purpose?

This would include timer switches, hardware driven interrupts, etc.

A hacky not-portable solution is fine.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 1
    There's no `perf` event for that, is there? Could `start` and `stop` read `/proc/interrupts` and check that no counters for the CPU you're pinned to incremented? Subject to false positives, especially if those `open` / `read` system calls take too long. (Or does `lseek` work to reread `/proc/interrupts`?). What about System Management Mode SMIs that even the kernel doesn't know about? Do you care about that? – Peter Cordes Jun 11 '18 at 02:04
  • 1
    @Peter so I actually wrote this question because I was using the performance counter "Hardware interrupts", but it doesn't seem to catch all of the interrupts. Well I can't be completely sure but that's my impression, for example, in a tight loop with a very specific profile I occasionally see uops executed that don't appear in the loop at all, but HW.INTERRUPTS doesn't budge. `/proc/interrupts` is my backup plan, but it seems heavy... – BeeOnRope Jun 11 '18 at 02:12
  • On which processor are you? The hardware interrupts PMU event is not reliable (not officially supported) on some uarchs, such as Haswell. If you are on a processor that does not support hardware interrupts event(s), there is no way but to use the kernel interrupt counters. – Hadi Brais Jun 11 '18 at 03:00
  • @HadiBrais Skylake client. Kernel interrupt counts could work - can they be accessed other than by parsing `/proc/interrupts`? – BeeOnRope Jun 11 '18 at 03:24
  • On Skylake, the event is `HW_INTERRUPTS.RECEIVED`. I don't know whether it's deterministic. The `/proc/interrupts` file can only be accessed as a file. There is no other way. Although I suggest using [/proc/stat](http://man7.org/linux/man-pages/man5/proc.5.html) instead. The `irq` statistic tells how much time the CPU has spent servicing interrupts (excluding deferred bottom halves, which I don't think you care about). So if that is larger than zero, then you'd know that at least one interrupt has occurred. The advantage here is that it's just a single number that you need to read per CPU. – Hadi Brais Jun 11 '18 at 04:11
  • Yeah I am using `HW_INTERRUPTS.RECEIVED` already, but it reads 0 even when I seem to have some unexpected code running. – BeeOnRope Jun 11 '18 at 04:12
  • Whatever counter you're using as an indication that `HW_INTERRUPTS.RECEIVED` does not count all interrupts may itself be incremented due to some other event. It's hard to know for sure, although may not be impossible. – Hadi Brais Jun 11 '18 at 04:16
  • It could be - it was many counters all showing periodic blips: it's a tight loop of only stores, and sometimes you'd see an execution that would take 2 or 3 times as long, with non-zero counts for "uops executed on port X" (normally 0) and various other counters all leading to the impression that some other code was running. If it wasn't my code doing this, the only thing I could think of was some type of interrupt (or SMM). Maybe a `softirq`? The `HW_INTERRUPTS.RECEIVED` read 0 though. @HadiBrais – BeeOnRope Jun 11 '18 at 04:18
  • SMM interrupts may occur multiple times per second. These are not counted in `/proc/interrupts` or `/proc/stat`. Instead, there is an MSR counter to count them. Handling an SMI can take up to more than 1 ms. But you can analyze them with some effort. I don't know whether `HW_INTERRUPTS.RECEIVED` counts SMIs. I doubt that. – Hadi Brais Jun 11 '18 at 04:38
  • On my Haswell (on which `HW_INTERRUPTS.RECEIVED` is not officially supported), I observed that it gets incremented a couple hundred times per second. – Hadi Brais Jun 11 '18 at 04:52
  • @HadiBrais - yes, I as far as I know it is "working" on my Skylake as I have also seen it increment and it has correlated with unexpected jumps runtimes before. Currently however, I am seeing a few blips but not `HW_I.R` events. Thanks for the SMM note: you may be interested in [this question](https://stackoverflow.com/q/50790715/149138). – BeeOnRope Jun 11 '18 at 04:56
  • One interesting experiment you can do is run the same benchmark multiple times and record `irq` and `HW_INTERRUPTS.RECEIVED` and execution time. Then subtract `irq` from execution time and observe the variance in measurement. – Hadi Brais Jun 11 '18 at 05:03
  • Oh, your loop includes stores to memory? To the same location or different locations? Any interrupt could cause cache lines to be evicted from L1D or TLB. – Hadi Brais Jun 11 '18 at 05:04
  • @HadiBrais - yes has stores to memory. To the same location. Yes, an interrupt could evict lines: but evicted lines wouldn't explain ops suddenly showing up on `p0156`. Of course, an interrupt itself could explain that - but as above `HW_INTERRUPTS.RECEIVED` is zero. The runtime delta is only about 200 ns. – BeeOnRope Jun 11 '18 at 05:08
  • If possible, can you tell me more details so that I can reproduce that situation? So far, we have determined two possible reasons: SMI or flawed `HW_INTERRUPTS.RECEIVED`. But there could be others. – Hadi Brais Jun 11 '18 at 05:35

0 Answers0