You don't want to avoid context-switches entirely, you want to let the kernel run other stuff during the 99% of the second where it's not running /usr/bin/date
to format time into a string and write(2)
it to stdout. (Or put this CPU core to sleep, saving power. But that actually doesn't count as a context switch, because software never changed page-tables or saved/restores FP registers. Entering the kernel at all even for a system call saves/restores integer registers, and with software Meltdown mitigation enabled on Intel CPUs that don't have an HW fix for that will actually change page-tables, though. And Spectre mitigation clearing branch prediction history is even more expensive.)
(A context switch is necessary, to your terminal emulator or sshd or whatever which is controlling the master side of the pseudo-terminal, if you aren't running this on a Linux text console, like ctrl+alt+F2. Only in the latter case would writing to video RAM actually happen in the write(0, buf, len)
system call made by date
, i.e. in the context of that process.)
If you want to minimize context switches (and system calls in general), you need to do the sleeping and writing from within a single process. But that's not possible in bash; it doesn't have a sleep builtin. (Bash does have printf '%(%T)T\n' $EPOCHSECONDS
to print the current time, but busy-waiting around that would be terrible). You'd want to write a program in C that just did sleeps and time-printing.
A loop using a fixed 1-second delay will accumulate error since it doesn't start the next second until after date
has started and exited, and the shell has forked/execed /usr/bin/sleep
the next iteration (plus startup overhead within the sleep
executable).
Without writing your own C program, you can get this down to just one fork/exec per second (and a bunch of other system calls) by using watch -p -t --exec
, which runs a given command at an interval, directly with fork/exec instead of /bin/sh -c
.
-t
tells it not to print a header (which includes the time)
-p
(precise) has it query the current time with clock_gettime
and use nanosleep
to avoid error accumulation, aiming for the same target time within a second every time. (The default is to sleep for a fixed interval between runs of your command, no matter how long it took.)
We can trace its system calls to see what it does. (I used a shorter sleep interval so I didn't have to leave it sitting as long.) Note that clock_gettime
doesn't show up in strace
because it doesn't enter the kernel; the glibc wrapper calls into the vDSO implementation. That code exported by the kernel (mapped into every user-space process) reads data exported by the kernel: a coarse time updated by the kernel's timer interrupts, and a scale factor/offset for rdtsc
to interpolate an offset from the current coarse time, since modern x86-64 systems have a precise constant-frequency counter accessible from user-space.
(watch
actually prints on the "alternate" screen, so the output is gone from your terminal when it exits; that part of the output was faked for example purposes. The rest is copy/pasted from a terminal emulator, with ## comments added.)
# use strace -f ... to trace into child processes, and see all the syscalls from date
$ strace -o foo.tr watch -p -t -n 0.5 --exec date +%T
22:31:54
control-C
$ less foo.tr
... startup stuff from watch, including some terminal-size ioctl
pipe([3, 4]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0f744fba10) = 3832377
# Linux implements fork() in terms of clone(2)
close(4) = 0
fcntl(3, F_GETFL) = 0 (flags O_RDONLY)
newfstatat(3, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
# (IDK why it's doing an fstat on the pipe FD)
read(3, "22:16:45\n", 4096) = 9
read(3, "", 4096) = 0
# reads from the pipe until EOF
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3832377, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
close(3) = 0
# then closes it
wait4(3832377, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3832377
rt_sigaction(SIGTSTP, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f0f7453ada0}, {sa_handler=0x7f0
f746f4790, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f0f7453ada0}, 8) = 0
# and waits for the child PID
write(1, "\33[?1049h\33[22;0;0t\33[1;42r\33(B\33[m\33["..., 46) = 46
# clears the screen and moves cursor to the top left
write(1, "22:16:45\33[42;134H", 17) = 17
# and copies what it read from the pipe earlier.
rt_sigaction(SIGTSTP, {sa_handler=0x7f0f746f4790, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f0f7453ada0}, NULL, 8) =
0
## There's a clock_gettime() somewhere, probably here,
## but the vDSO implementation avoids entering the kernel so strace doesn't see it.
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=498451000}, NULL) = 0
# After calculating the exact time until the next event
# tell the kernel we're done until then
# Then the cycle starts over again when it wakes
pipe([3, 4]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0f744fba10) = 3832378
close(4) = 0
fcntl(3, F_GETFL) = 0 (flags O_RDONLY)
...
watch
without -t
will print the current time as part of its header. So if that's what you want, you don't need date
anymore.
But it doesn't have an option to not run any program. It stats /etc/localtime every time in case the current timezone has changed.
You could use /bin/true
, but that still has to get forked/execed and run its dynamic linker startup overhead. Or you could use watch --exec /non-existant
and let it print an execve
error every time. But even then it would still fork before trying to exec, creating a new PID and context-switching to it.