Best way to update a clock every second in Linux

Question

When we talk about updating a clock every second in Linux, I think that something similar to the following code is what comes to mind.

while :; do date +%T; sleep 1; done

This piece of code always bugged me, since there's an infinite loop running two commands every second, which means context switching that produces a slight spike in the processor usage.

With that in mind, I'd like to know: is this really the best way to do this? Is there a more clever way to do it? If I would to reproduce this in a low level language like C, for example, would the only way to do it still be an infinite loop with a printf to show the clock and a one second sleep? That is, is there a way to avoid such context switching and use the CPU in a smarter way?

Context switching is the typical way for programs to not waste CPU cycles when they have nothing to do. Could you try to describe what you imagine "using the CPU in a smarter way" might look like? — zwol, Oct 13 '21 at 13:43
You could implement this in a C program or use some other scripting language like Perl or Python to avoid starting a `date` process every second. What's the problem with "*a slight spike in the processor usage*"? — Bodo, Oct 13 '21 at 13:46
In the first place, you absolutely should not be trying to reason about context switches in a shell script. Those occur at a much lower level. In the second place, your system already performs thousands of context switches per second. A couple more will not noticeably move the needle. — John Bollinger, Oct 13 '21 at 13:46
Honestly, I have no idea how "using the CPU in a smarter way" would look like. I'm just curious if there's even some way to do it in this context. — pvpscript, Oct 13 '21 at 13:50
The thing is _I_ have no idea what "using the CPU in a smarter way" would be either. But that doesn't mean there is no such way, it just means I do not understand what you are imagining. What do you imagine the CPU should be doing for the millions of cycles during each second, that are not required for the display update, and that doesn't involve switching to some other process? — zwol, Oct 13 '21 at 13:56
I was thinking if it was possible to avoid switching from the `display date -> other processes` back and forth so much, but as I read the comments/answers I'm getting pretty convinced that, by design, there's no such thing — pvpscript, Oct 13 '21 at 17:26
The CPU spike is likely more due to the I/O operation than the context switch. What happens if you comment out the actual time output? — Martin James, Oct 15 '21 at 14:47
Adding to [Martin James](https://stackoverflow.com/users/758133/martin-james)' [comment](https://stackoverflow.com/questions/69556624/best-way-to-update-a-clock-every-second-in-linux#comment122997139_69556624): Instead of commenting out the `date` command, suppress the output: `while :; do date +%T > /dev/null; sleep 1; done` and check if you still see the spikes. — Bodo, Oct 20 '21 at 08:45

Peter Cordes · Answer 1 · 2021-10-14T13:10:44.967

You don't want to avoid context-switches entirely, you want to let the kernel run other stuff during the 99% of the second where it's not running /usr/bin/date to format time into a string and write(2) it to stdout. (Or put this CPU core to sleep, saving power. But that actually doesn't count as a context switch, because software never changed page-tables or saved/restores FP registers. Entering the kernel at all even for a system call saves/restores integer registers, and with software Meltdown mitigation enabled on Intel CPUs that don't have an HW fix for that will actually change page-tables, though. And Spectre mitigation clearing branch prediction history is even more expensive.)

(A context switch is necessary, to your terminal emulator or sshd or whatever which is controlling the master side of the pseudo-terminal, if you aren't running this on a Linux text console, like ctrl+alt+F2. Only in the latter case would writing to video RAM actually happen in the write(0, buf, len) system call made by date, i.e. in the context of that process.)

If you want to minimize context switches (and system calls in general), you need to do the sleeping and writing from within a single process. But that's not possible in bash; it doesn't have a sleep builtin. (Bash does have printf '%(%T)T\n' $EPOCHSECONDS to print the current time, but busy-waiting around that would be terrible). You'd want to write a program in C that just did sleeps and time-printing.

A loop using a fixed 1-second delay will accumulate error since it doesn't start the next second until after date has started and exited, and the shell has forked/execed /usr/bin/sleep the next iteration (plus startup overhead within the sleep executable).

Without writing your own C program, you can get this down to just one fork/exec per second (and a bunch of other system calls) by using watch -p -t --exec, which runs a given command at an interval, directly with fork/exec instead of /bin/sh -c.

-t tells it not to print a header (which includes the time)
-p (precise) has it query the current time with clock_gettime and use nanosleep to avoid error accumulation, aiming for the same target time within a second every time. (The default is to sleep for a fixed interval between runs of your command, no matter how long it took.)

We can trace its system calls to see what it does. (I used a shorter sleep interval so I didn't have to leave it sitting as long.) Note that clock_gettime doesn't show up in strace because it doesn't enter the kernel; the glibc wrapper calls into the vDSO implementation. That code exported by the kernel (mapped into every user-space process) reads data exported by the kernel: a coarse time updated by the kernel's timer interrupts, and a scale factor/offset for rdtsc to interpolate an offset from the current coarse time, since modern x86-64 systems have a precise constant-frequency counter accessible from user-space.

(watch actually prints on the "alternate" screen, so the output is gone from your terminal when it exits; that part of the output was faked for example purposes. The rest is copy/pasted from a terminal emulator, with ## comments added.)

  # use strace -f ...  to trace into child processes, and see all the syscalls from date
$ strace -o foo.tr    watch -p -t -n 0.5 --exec   date +%T
22:31:54
control-C

$ less foo.tr
... startup stuff from watch, including some terminal-size ioctl

pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0f744fba10) = 3832377
   # Linux implements fork() in terms of clone(2)
close(4)                                = 0
fcntl(3, F_GETFL)                       = 0 (flags O_RDONLY)
newfstatat(3, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
   # (IDK why it's doing an fstat on the pipe FD)
read(3, "22:16:45\n", 4096)             = 9
read(3, "", 4096)                       = 0
   # reads from the pipe until EOF
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3832377, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
close(3)                                = 0
   # then closes it
wait4(3832377, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3832377
rt_sigaction(SIGTSTP, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f0f7453ada0}, {sa_handler=0x7f0
f746f4790, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f0f7453ada0}, 8) = 0
   # and waits for the child PID
write(1, "\33[?1049h\33[22;0;0t\33[1;42r\33(B\33[m\33["..., 46) = 46
   # clears the screen and moves cursor to the top left
write(1, "22:16:45\33[42;134H", 17)     = 17
   # and copies what it read from the pipe earlier.
rt_sigaction(SIGTSTP, {sa_handler=0x7f0f746f4790, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f0f7453ada0}, NULL, 8) =
 0

## There's a clock_gettime() somewhere, probably here,
##  but the vDSO implementation avoids entering the kernel so strace doesn't see it.
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=498451000}, NULL) = 0
  # After calculating the exact time until the next event
  # tell the kernel we're done until then

  # Then the cycle starts over again when it wakes
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0f744fba10) = 3832378
close(4)                                = 0
fcntl(3, F_GETFL)                       = 0 (flags O_RDONLY)
...

watch without -t will print the current time as part of its header. So if that's what you want, you don't need date anymore.

But it doesn't have an option to not run any program. It stats /etc/localtime every time in case the current timezone has changed.

You could use /bin/true, but that still has to get forked/execed and run its dynamic linker startup overhead. Or you could use watch --exec /non-existant and let it print an execve error every time. But even then it would still fork before trying to exec, creating a new PID and context-switching to it.

Thanks for such thorough answer, I definitely learned a lot by reading it! I particularly loved the fact that you even mentioned Spectre and Meltdown mitigations, and I also really liked the commented `strace`. — pvpscript, Oct 14 '21 at 13:48
@pvpscript: Glad you found those parts useful! Yeah, Spectre+Meltdown mitigation makes a trivial system call [maybe something like 10x](https://stackoverflow.com/questions/48913091/fastest-linux-system-call/48913805#comment90359576_48914200) as expensive as just the round-trip to kernel mode and back (for a bad call number returning `-ENOSYS`), on a Skylake CPU, so it's relevant to caring about the cost of each system call, not just process startup. Plus the cost in draining the out-of-order exec back end, and extra cache / TLB / branch misses after kernel code pollutes those caches. — Peter Cordes, Oct 14 '21 at 13:59

Steve Summit · Answer 2 · 2021-10-13T21:55:35.350

I doubt there's a way to avoid the context switch — or if there were a way, it would be more wasteful than techniques involving sleep.

The real problem with techniques epitomized by your

while :; do date +%T; sleep 1; done

is that they lose time. For example, if I run this modification, incorporating my own dateexpr program that has, among other things, the ability to work with subseconds:

while :; do dateexpr +%H:%M:%.2S now; sleep 1; done

, this is what I see:

10:13:48.40
10:13:49.41
10:13:50.43
10:13:51.44
10:13:52.46
10:13:53.47
10:13:54.49
10:13:55.50

So it looks like the "context switches" — the overhead of firing up each sleep and date or dateexpr process — are taking 10-20 ms.

I've written a program (in C) to get around this. It continually monitors the time, and computes a value of slightly less than a second to sleep for, so that it can invoke a subcommand exactly once per second, on the second. It looks like this:

$ synchro dateexpr +%H:%M:%.2S now
10:17:11.01
10:17:12.01
10:17:13.01
10:17:14.01
10:17:15.01
10:17:16.01
10:17:17.01

There's still that 10ms error in starting up the invoked process, but at least it doesn't accumulate.

But in order to do its job, my synchro program is having to make a bunch more system calls, so there are actually more context switches, not fewer.

But, of course, in general calling something like sleep is the right thing to do when you want to pause for a while, because you are explicitly relinquishing control, and the OS knows it doesn't have to schedule your process to run at all, so you place minimal load on the rest of the system while you're sleeping. Yes, there are a couple of context switches involved, but they seem minimal, a small price to pay, and as I said, I don't think you can get around them.

I've wondered if there was a way to run a clock or timer entirely in user space, and perhaps that's what you're asking, too. But I doubt there's a way to, because there's nothing [Footnote 1] you can get your hands on in user space that gives you any information about time or clocks — that information is all over in the kernel, meaning it's going to take a system call to get to it.

(Here I'm thinking exclusively about a process running under a conventional, multitasking OS, of course. If you were writing embedded code for a microprocessor with an RTC, there's no question you could do exactly what you want, with no context switches at all.)

There's one slim possibility which is that under at least some (perhaps these days most?) versions of Linux, there's a mechanism called vDSO which enables certain system calls to be carried out in user space, without the need for a context switch. The premier candidate for a system call to receive this special treatment is gettimeofday and related. So, on a system using vDSO, you could write a program with a busy-wait loop, repeatedly calling gettimeofday (or time or clock_gettime, if those use vDSO also) until the desired time arrived, and because of vDSO, you'd be doing this without context switches. But of course busy-waiting is an almost irredeemably horrible idea, so I'm not seriously recommending this. (That's what I meant at the beginning of this answer when I said "if there were a way, it would be more wasteful than techniques involving sleep.")

Footnote 1. I said there's "nothing you can get your hands on in user space that gives you any information about time or clocks", but that's not quite true. As the comments from Peter Cordes remind us, Intel processors, at least, give us the "Time Stamp Counter" and the rdtsc instruction to read it. This is a potentially vital — but also hugely problematic! — tool for writing certain high-precision timing applications, but I've never used it so I won't try to explain it or its caveats.

Assuming a recent version of `bash`, `printf "%(%T)T\n'` lets you get rid of the call to `date`, at least. — chepner, Oct 13 '21 at 14:41
Thank you so much for that, it was a very insightful answer! I also looked at your projects and couldn't find the `synchro` (they are all really neat ideas, by the way). Do you mind sharing it, or at least talk about how you achieved it? — pvpscript, Oct 13 '21 at 17:32
@pvpscript I'm happy to post the code for `synchro`. If I forget, feel free to ping or email me. — Steve Summit, Oct 13 '21 at 17:42
Yes, `clock_gettime()` is purely user-space on modern i386/x86-64 Linux. The glibc wrapper for it knows that it's available in the vDSO, and on normal systems the vDSO code reads a global variable updated by the kernel's timer interrupts, and (on modern x86-64) uses `rdtsc` and some scale factors also from the vDSO to interpolate a small correction to the coarse system time. On a system without high-precision timing accessible from user-space (e.g. non-constant TSC, like before Core 2), the vDSO entry for clock_gettime would instead contain code that used `syscall`, I imagine. — Peter Cordes, Oct 13 '21 at 21:16
If you know the TSC frequency on your CPU, you can even spin-wait for a deadline without even vDSO kernel interaction. (Only appropriate for waits less than a microsecond, maybe less, and only if you need to very exactly hit the wake-up time, because yes this is *spinning*, not sleeping). [How to calculate time for an asm delay loop on x86 linux?](https://stackoverflow.com/q/49924102) has an example. Future CPUs will have a `tpause` instruction to save some power while doing that, but of course still only sensible for waits too short to let the kernel context switch to another task and back — Peter Cordes, Oct 13 '21 at 21:21
@PeterCordes Thanks eversomuch for that info about the tsc. If I once knew that, I had completely forgotten. Footnote added to answer. — Steve Summit, Oct 13 '21 at 21:58
Forgot to mention, even on a system without high-precision user-space timing, Linux `clock_gettime(CLOCK_REALTIME_COARSE)` is *just* the coarse time, without doing extra rdtsc to refine it. Its precision depends on timer-interrupt interval. [The man page](https://man7.org/linux/man-pages/man2/clock_gettime.2.html) says support for that flag depends on per-architecture support in the vdso. Or `CLOCK_MONOTONIC_COARSE` if you're waiting for an interval, rather than waiting until the system time = whatever. — Peter Cordes, Oct 13 '21 at 22:17
Coarse times are probably no better than the precision you'd get from nanosleep, so maybe only useful when deciding whether you're close enough to just spin instead of sleep at all. — Peter Cordes, Oct 13 '21 at 22:20
While working on my own answer, I noticed that GNU coreutils `date` has a `%N` format to print the nanoseconds portion of the time. — Peter Cordes, Oct 14 '21 at 01:38
'10-20 ms' for a context switch? Only if you are running it on an abacus:) — Martin James, Oct 15 '21 at 14:48

Best way to update a clock every second in Linux

2 Answers2