Minimum time a thread can pause in Linux

Question

In my application, threads need to pause for a very little time (100s of clock cycles). One way to pause is to call nanosleep, but I suppose it requires a system call to the kernel. Now I want to pause without going to the kernel.

Note that I have enough cores to run my threads on and I bind each thread to a separate core, so even an instruction that can halt the core for a little while would be good. I am using x86. I just want the thread to halt while pausing. I don't want a busy loop or a system call to the kernel. Is it possible to do this? What is the minimum time I can pause a thread?

This post has interesting discussion on this topic. http://stackoverflow.com/questions/4725676/how-does-x86-pause-instruction-work-in-spinlock-and-can-it-be-used-in-other-sce — MetallicPriest, Sep 10 '11 at 13:13

Steve-o · Accepted Answer · 2011-09-10T16:59:46.710

7

_mm_pause in a busy-wait loop is the way to go.

Unfortunately the delay it provides can change with each processor family:

http://siyobik.info/main/reference/instruction/PAUSE

Example usage for GCC on Linux:

#include <xmmintrin.h>

int main (void) {
    _mm_pause();
    return 0;
}

Compile with MMX enabled:

gcc -o moo moo.c  -march=native

Also you can always just use inline assembler:

__asm volatile ("pause" ::: "memory");

From some Intel engineers, you might find this useful to determine the cost of pausing:

NOP instruction can be between 0.4-0.5 clocks and PAUSE instruction can consume 38-40 clocks.

http://software.intel.com/en-us/forums/showthread.php?t=48371

edited Sep 10 '11 at 16:59

answered Sep 10 '11 at 13:46

Steve-o

12,678
2
41
60

How to use _mm_pause in gcc? Which header file is required? – MetallicPriest Sep 10 '11 at 16:05
Steve-o, thanks, but what is the purpose of ::: "memory" here? Is it memory fence or something? If so, wouldn't it increase cache coherence load? – MetallicPriest Sep 10 '11 at 16:47
@MetallicPriest it is to make sure the pause runs in the correct place. – Steve-o Sep 10 '11 at 16:58
`nop` doesn't stall out-of-order execution, or even need any back-end execution resources. `nop` throughput is 4 per clock on everything since Core2 (except Atom). https://agner.org/optimize/. Anyway, `nop` doesn't "take cycles" any more than `add` does, it consumes front-end bandwidth and instruction-cache space. **Anyway, `pause` on Sandybridge-family is about 5 cycles until Skylake, where it's about 100 cycles. Are your numbers from Pentium4 or something?** – Peter Cordes Aug 18 '18 at 11:59

score 1 · Answer 2 · answered Sep 10 '11 at 16:11

1

Why don't you just spin-wait yourself? You can, in a loop, repeatedly call the rdtsc instruction to get the clock cycle count and then just stop if the difference exceeds 100 clock cycles.

I presume it's for a trading system, for which this is a common technique

answered Sep 10 '11 at 16:11

Foo Bah

25,660
5
55
79

rdtsc consumes cycles, which I don't want. I just want to halt a core. – MetallicPriest Sep 10 '11 at 16:13
@MetallicPriest If you have enough cores so that you can bind a thread to each core without overlapping, why do you want to halt the core? – Foo Bah Sep 10 '11 at 16:14
Fooh, to stop putting pressure on cache coherence and such things. – MetallicPriest Sep 10 '11 at 16:20

score 0 · Answer 3 · answered Sep 10 '11 at 13:47

It depends on what you mean by pause. If by pause you want to stop the thread for a short period of time, only the OS can do this.

However, if by pause you want a very short delay, you can do this with a busy loop. The problem with using such a loop is you don't know how long it is really running for. You can estimate it, but an interrupt can make it longer.

score 0 · Answer 4 · answered Sep 10 '11 at 14:38

Generally speaking, for such a short delay, I would think a system call is not practical, because the overhead of system call + scheduling + context switch and back again is going to be way longer then your pause, as you seem to already be aware.

What you are left with is to spin (busy wait) to produce the delay. You can loop reading TSC values to know how much to spin, for example (or applicable cycle counter register for other processors)

Yes, spinning like this indeed wastes power, and if you are running on a CPU with multiple hardware threads, as multi core usually implies, you are also taking execution slots from the other threads needlessly, but unless you have a very very very low overhead system call and scheduler mechanism AND a high res timer, I'd say it's not possible.

score -1 · Answer 5 · answered Sep 10 '11 at 13:07

-1

no - it is not possible. Either call sleep or select - kernel; or have a loop have wastes time.

answered Sep 10 '11 at 13:07

Ed Heal

59,252
17
87
127

R.. GitHub STOP HELPING ICE · Answer 6 · 2011-09-10T14:07:15.613

-2

Your only hope of getting timing that precise is using timer_create and having the timer expiration delivered by a signal, I think. With realtime scheduling.

I'm not sure what it could possibly be useful for, though, since you cannot perform any IO in such a small time window, and therefore it should not matter if your code runs 1000 times with a 100ns gap between each run, or 1000 times all together with a 100ms gap at the end.

edited Sep 10 '11 at 14:07

answered Sep 10 '11 at 14:01

R.. GitHub STOP HELPING ICE

208,859
35
376
711

1

No, I can't afford that much overhead! – MetallicPriest Sep 10 '11 at 15:39
What are you trying to achieve? What useful could you possibly do in such little time that would depend on the time at which it happens? – R.. GitHub STOP HELPING ICE Sep 10 '11 at 16:08
2

Its for a specific purpose. The reason I want to pause is because I dont want to put pressure on cache coherence and such things. – MetallicPriest Sep 10 '11 at 16:15

Minimum time a thread can pause in Linux

6 Answers6

Linked