9

I am wondering how is sleep/nanosleep internally implemented? Consider this code:

{ // on a thread other than main() thread
  while(1)
  {
    //do something
    sleep(1);
  }
}

would the CPU be doing constant context switching to check if sleep of 1 sec is done (i.e. an internal busy wait).

I doubt it works this way, too much inefficiency. But then how does it work?

Same question applies to nanosleep.

Note: If this is implementation/OS specific, then how can I possibly implement a more efficient scheme that doesn't lead to a constant context switching?

Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110
Kam
  • 5,878
  • 10
  • 53
  • 97
  • 2
    What's wrong with the question? – Kam Jun 21 '15 at 21:25
  • _"What's wrong with the question?"_ Too broad. OS specific, implementation specific. – πάντα ῥεῖ Jun 21 '15 at 21:27
  • POSIX doesn't specify how it should be implemented – StenSoft Jun 21 '15 at 21:28
  • @StenSoft And less so [tag:c++] does, as the question was originally tagged. – πάντα ῥεῖ Jun 21 '15 at 21:31
  • You probably can't implement a better alternative, without better knowing the OS you work on. Most often not worth the time. Make a note a good, visible place and wait until someone issues a bugreport – WorldSEnder Jun 21 '15 at 21:35
  • 2
    sleep normally puts the process into a waiting queue, where is waits to be rescheduled once the sleep period is expired. So when OS decides to switch the context (usually triggered by timer) it will browse the queued processes and activate the one, which is ready to be executed (plus some priority benefits). – Archie Jun 21 '15 at 21:39
  • The passage of real time is not something you can control from within C++ without recourse to platform facilities (perhaps wrapped in the `` library), since real time is a property of your OS scheduler. "Sleeping" involves instructing the OS scheduler. Intrinsically, a program can never *not* run. It must always move forward, from one instruction to the next. Only the OS scheduler can stop a program from running. – Kerrek SB Jun 21 '15 at 22:22
  • 1
    @Archie it's common to put the processes/threads into a queue-style container that is sorted by expiry-time. The OS does not need to browse - it just has to pop the item at the head of the queue. – Martin James Jun 21 '15 at 23:07
  • Possible duplicate of [What's the algorithm behind sleep()?](http://stackoverflow.com/questions/175882/whats-the-algorithm-behind-sleep) – Ciro Santilli OurBigBook.com Nov 17 '15 at 07:33

4 Answers4

9

The typical way to implement sleep() and nanosleep() is to convert the argument into whatever scale the OS's scheduler uses (while rounding up) and add the current time to it to form an "absolute wake up time"; then tell the scheduler not to give the thread CPU time until after that "absolute wake up time" has been reached. No busy waiting is involved.

Note that whatever scale the OS's scheduler uses typically depends on what hardware is available and/or being used for time keeping. It can be smaller than a nanosecond (e.g. local APIC on 80x86 being used in "TSC deadline mode") or as large as 100 ms.

Also note that the OS guarantees that the delay won't be less than what you ask for; but there's typically no guarantee that it won't be longer and in some cases (e.g. low priority thread on a heavily loaded system) the delay can be much much larger than requested. For example, if you ask to sleep for 123 nanoseconds then you might sleep for 2 ms before the scheduler decides it can give you CPU time, and then it might be another 500 ms before the scheduler actually does give you CPU time (e.g. because other threads are using the CPU).

Some OSs may try to reduce this "slept much longer than requested" problem, and some OSs (e.g. designed for hard-real time) may provide some sort of guarantee (with restrictions - e.g. subject to thread priority) for the minimum time between delay expiry and getting CPU back. To do this, the OS/kernel would convert the argument into whatever scale the OS's scheduler uses (while rounding down and not rounding up) and may subtract a tiny amount "just in case"; so that the scheduler wakes the thread up just before the requested delay expires (and not after); and then when the thread is given CPU time (after the cost of the context switch to the thread, and possibly after pre-fetching various cache lines the thread is guaranteed to use) the kernel would busy wait briefly until the delay has actually expired. This allows the kernel to pass control back to the thread extremely close to delay expiry.

For example, if you ask to sleep for 123 nanoseconds, then scheduler might not give you CPU time for 100 nanoseconds, then it might spend 10 nanoseconds switching to your thread, then it might busy wait for the remaining 13 nanoseconds. Even in this case (where busy waiting is done) it normally won't busy wait for the full duration of the delay. However, if the delay is extremely short the kernel would only do the final busy waiting.

Finally, there is a special case that may be worth mentioning. On POSIX systems sleep(0); is typically abused as a yield(). I'm not too sure how legitimate this practice is - it's impossible for a scheduler to support something like yield() unless that scheduler is willing to waste CPU time doing unimportant work while more important work waits.

Brendan
  • 35,656
  • 2
  • 39
  • 66
4

The POSIX specification of sleepand nanosleep say (emphasis mine)

The sleep() function shall cause the calling thread to be suspended from execution until either the number of realtime seconds specified by the argument seconds has elapsed or a signal is delivered to the calling thread and its action is to invoke a signal-catching function or to terminate the process. The suspension time may be longer than requested due to the scheduling of other activity by the system.

(Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/sleep.html.)

and

The nanosleep() function shall cause the current thread to be suspended from execution until either the time interval specified by the rqtp argument has elapsed or a signal is delivered to the calling thread, and its action is to invoke a signal-catching function or to terminate the process. The suspension time may be longer than requested because the argument value is rounded up to an integer multiple of the sleep resolution or because of the scheduling of other activity by the system. But, except for the case of being interrupted by a signal, the suspension time shall not be less than the time specified by rqtp, as measured by the system clock CLOCK_REALTIME.

(Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/nanosleep.html.)

I read that to say that a POSIX-compliant system cannot use a busy loop for sleep or nanosleep. The calling thread needs to be suspended from execution.

David Hammen
  • 32,454
  • 9
  • 60
  • 108
2

Exact implementation is not guaranteed here but you can expect some properties.

Usually sleep (3) is quite inaccurate and as Linux 'man sleep 3' states could be even implemented using SIGALM (signals). So it is definitely not about performance. It is definitely not about spin locks too so cannot be CPU intensive.

nanosleep is quite different animal which could be implemented even using spinlocks. Which is more important, at least in Linux nanosleep man is in section 2 which stands it is system call so at least it should include switch to kernel mode. Do you really need its high resolution?

UPDATE

As I see your comment I do recommend select() usage as man select 3 states:

   #include <stdio.h>
   #include <stdlib.h>
   #include <sys/time.h>
   #include <sys/types.h>
   #include <unistd.h>

   int
   main(void)
   {
       fd_set rfds;
       struct timeval tv;
       int retval;

       /* Watch stdin (fd 0) to see when it has input. */
       FD_ZERO(&rfds);
       FD_SET(0, &rfds);

       /* Wait up to five seconds. */
       tv.tv_sec = 5;
       tv.tv_usec = 0;

       retval = select(1, &rfds, NULL, NULL, &tv);
       /* Don't rely on the value of tv now! */

       if (retval == -1)
           perror("select()");
       else if (retval)
           printf("Data is available now.\n");
           /* FD_ISSET(0, &rfds) will be true. */
       else
           printf("No data within five seconds.\n");

       exit(EXIT_SUCCESS);
   }

It is proven mechanics if you need to sleep in thread for some event and this event could be linked to file descriptor.

Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110
  • I am not looking for high resolution, I am more worried about performance of my other threads in constant context switching because I have a thread that is sleeping 99.9% of the time (with sleep(30)) – Kam Jun 21 '15 at 22:00
  • 30 seconds is an eternity, you can safely throw away the thread CPU quanta with sleeps of even a bunch milliseconds without the system noticing at all. – Matteo Italia Jun 21 '15 at 22:46
  • 2
    Note: This question is currently tagged as [tag:posix], not [tag:linux]. In POSIX, `nanosleep` cannot be implemented with spin locks. The thread needs to be suspended. Whether that's the case in linux is a different question because linux is not POSIX compliant. – David Hammen Jun 21 '15 at 22:54
0

"I am wondering how is sleep/nanosleep internally implemented?"

There's not the one implementation for it, but each OS and POSIX compliant implementation of sleep() and nanosleep() are free in how they're actually implementing this feature.

So asking about how it's actually done is pretty useless, without more context of a particular OS/POSIX library implementation.

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190