8

I'm trying to write some code to determine if clock_gettime used with CLOCK_MONOTONIC_RAW will give me results coming from the same hardware on different cores.

From what I understand it is possible for each core to produce independent results but not always. I was given the task of obtaining timings on all cores with a precision of 40 nanoseconds.

The reason I'm not using CLOCK_REALTIME is that my program absolutely must not be affected by NTP adjustments.

Edit:

I have found the unsynchronized_tsc function which tries to test whether the TSC is the same on all cores. I am now attempting to find if CLOCK_MONOTONIC_RAW is based on the TSC.

Final edit:

It turns out that CLOCK_MONOTONIC_RAW is always usable on multi-core systems and does not rely on the TSC even on Intel machines.

Community
  • 1
  • 1
OlivierLi
  • 2,798
  • 1
  • 23
  • 30
  • 2
    Why is it ever interesting to anyone except implementers of `CLOCK_MONOTONIC_RAW`? They give you a global (not per-core), unadjusted, monotonic clock. They worry about any magic needed to produce it, so that you don't have to. If the TSCs are not stable or synchronized they will use something else. – n. m. could be an AI Mar 24 '14 at 16:30
  • @n.m. I guess I could have formulated my question better. Knowing that the result of `CLOCK_MONOTONIC_RAW` is the same on all cores is enough for me. From you comment I understand that it is. Could you please point me to some documentation that confirms it? I would need it for my technical report at work. – OlivierLi Mar 24 '14 at 17:05
  • "the result of CLOCK_MONOTONIC_RAW is the same on all cores" --- it is not even clear what this statement could mean. Let me reiterate: there is only one raw monotonic clock, not one per core. It's exactly the same as with e.g. the real-time clock. Are you worried about the real-time clock being dependent on which core yoy are on? – n. m. could be an AI Mar 24 '14 at 17:16
  • I was under the impression that the CLOCK_MONOTONIC_RAW was always based on the TSCs. But now I understand. Sorry for the confusion and thanks for your time! – OlivierLi Mar 24 '14 at 17:19

1 Answers1

4

To do measurements this precisely; you'd need:

  • code that's executed on all CPUs, that reads the CPU's time stamp counter and stores it as soon as "an event" occurs
  • some way to create "an event" that is noticed at the same time by all CPUs
  • some way to prevent timing problems caused by IRQs, task switches, etc.

Various possibilities for the event include:

  • polling a memory location in a loop, where one CPU writes a new value and other CPUs stop polling when they see the new value
  • using the local APIC to broadcast an IPI (inter-processor interrupt) to all CPUs

For both of these methods there are delays between the CPUs (especially for larger NUMA systems) - a write to memory (cache) may be visible on the CPU that made the write immediately, and be visible by a CPU on a different physical chip (in a different NUMA domain) later. To avoid this you may need to find the average of initiating the event on all CPUs. E.g. (for 2 CPUs) one CPU initiates and both measure, then the other CPU initiates and both measure, then results are combined to cancel out any "event propagation latency".

To fix other timing problems (IRQs, task switches, etc) I'd want to be doing these tests during boot where nothing else can mess things up. Otherwise you either need to prevent the problems (ensure all CPUs are running at the same speed, disable IRQs, disable thread switches, stop any PCI device bus mastering, etc) or cope with problems (e.g. run the same test many times and see if you get similar results most of the time).

Also note that all of the above can only ensure that the time stamp counters were in sync at the time the test was done, and don't guarantee that they won't become out of sync after the test is done. To ensure the CPUs remain in sync you'd need to rely on the CPU's "monotonic clock" guarantees (but older CPUs don't make that guarantee).

Finally; if you're attempting to do this in user-space (and not in kernel code); then my advice is to design code in a way that isn't so fragile to begin with. Even if the TSCs on different CPUs are guaranteed to be perfectly in sync at all times, you can't prevent an IRQ from interrupting immediately before or immediately after reading the TSC (and there's no way to atomically do something and read TSC at the same time); and therefore if your code requires such precisely synchronised timing then your code's design is probably flawed.

Brendan
  • 35,656
  • 2
  • 39
  • 66
  • Is there any reason executing as kernel module would be inferior to executing during boot? You could disable everything you need inside the kernel modules too. – Shahbaz Mar 24 '14 at 16:27
  • @VisaisRacism: Would you want to disable things like IRQs and PCI bus mastering transfers (basically, any kind of IO) while the OS is potentially under load? – Brendan Mar 24 '14 at 16:30
  • I actually don't know how things would work (hence the question). I imagine if you can pause the boot while you do the measurements, you should be able to pause everything from a kernel module, isn't that so? The reason I'm asking is that being able to do it as a kernel module makes the implementation and debugging much easier (since you eliminate a kernel compile on every try, and perhaps save a couple of restarts). – Shahbaz Mar 24 '14 at 16:56
  • @VisaisRacism: During boot (before you've started any device drivers, etc) there's nothing to pause – Brendan Mar 24 '14 at 17:21