4

When trying to disassemble every instruction of a program as this SO answer suggests, I found out that gdb will never finish processing after stepping instruction into gettimeofday. Here is a minimal reproducible example:

In main.c:

#include <stdio.h>
#include <stdlib.h>

#include <sys/time.h>

int main()
{
    struct timeval tv;
    
    gettimeofday(&tv, NULL);

    return 0;
}

Compile with gcc main.c

To run: gdb -silent a.out

Reading symbols from a.out...
(gdb) set height 0
(gdb) b main
Breakpoint 1 at 0x1175: file main.c, line 7.
(gdb) set logging file ./log.txt
(gdb) set logging redirect on
(gdb) set logging on
Redirecting output to ./log.txt.
Copying debug output to ./log.txt.
(gdb) r
(gdb) while 1
 >si
 >end

And log.txt shows:

Starting program: /home/user/tst-gdb-step/a.out 

Breakpoint 1, main () at main.c:7
7   {
0x000055555555517e  7   {
0x0000555555555182  7   {
10      gettimeofday(&tv, NULL);
0x0000555555555188  10      gettimeofday(&tv, NULL);
0x000055555555518d  10      gettimeofday(&tv, NULL);
0x0000555555555190  10      gettimeofday(&tv, NULL);
0x0000555555555070 in gettimeofday@plt ()
0x0000555555555074 in gettimeofday@plt ()
0x00007ffff7fcd690 in gettimeofday ()
0x00007ffff7fcd691 in gettimeofday ()
0x00007ffff7fcd698 in gettimeofday ()
0x00007ffff7fcd69b in gettimeofday ()
0x00007ffff7fcd69d in gettimeofday ()
0x00007ffff7fcd69e in gettimeofday ()
0x00007ffff7fcd6a1 in gettimeofday ()
0x00007ffff7fcd6a7 in gettimeofday ()
0x00007ffff7fcd6aa in gettimeofday ()
0x00007ffff7fcd6ae in gettimeofday ()
0x00007ffff7fcd6b4 in gettimeofday ()
0x00007ffff7fcd6ba in gettimeofday ()
0x00007ffff7fcd6bd in gettimeofday ()
0x00007ffff7fcd6c3 in gettimeofday ()
0x00007ffff7fcd6c6 in gettimeofday ()
0x00007ffff7fcd6c8 in gettimeofday ()
0x00007ffff7fcd6cc in gettimeofday ()
0x00007ffff7fcd6cf in gettimeofday ()
0x00007ffff7fcd6d2 in gettimeofday ()
0x00007ffff7fcd6d8 in gettimeofday ()
0x00007ffff7fcd6df in gettimeofday ()
0x00007ffff7fcd6e6 in gettimeofday ()
0x00007ffff7fcd6ed in gettimeofday ()
0x00007ffff7fcd6f0 in gettimeofday ()
0x00007ffff7fcd6f2 in gettimeofday ()
0x00007ffff7fcd6f5 in gettimeofday ()
0x00007ffff7fcd6f9 in gettimeofday ()
0x00007ffff7fcd6fc in gettimeofday ()
0x00007ffff7fcd702 in gettimeofday ()
0x00007ffff7fcd709 in gettimeofday ()
0x00007ffff7fcd70c in gettimeofday ()
0x00007ffff7fcd70f in gettimeofday ()
0x00007ffff7fcd6a7 in gettimeofday ()
0x00007ffff7fcd6aa in gettimeofday ()
0x00007ffff7fcd6ae in gettimeofday ()
0x00007ffff7fcd6b4 in gettimeofday ()
0x00007ffff7fcd6ba in gettimeofday ()
0x00007ffff7fcd6bd in gettimeofday ()
0x00007ffff7fcd6c3 in gettimeofday ()
0x00007ffff7fcd6c6 in gettimeofday ()
0x00007ffff7fcd6c8 in gettimeofday ()
0x00007ffff7fcd6cc in gettimeofday ()
0x00007ffff7fcd6cf in gettimeofday ()
0x00007ffff7fcd6d2 in gettimeofday ()
0x00007ffff7fcd6d8 in gettimeofday ()
0x00007ffff7fcd6df in gettimeofday ()
...

After 3 whole minutes of exeuction, gdb writes nearly 500,000 lines to log.txt (and without any sign to stop), which is obviously not the normal case, since vdso is meant to be fast. The log also shows an infinite loop.

But if n instead of si is used, the program can exit without any problem.

Tools I'm using:

$ uname -a
Linux 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ gdb --version
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2

I'm wondering why does this happen. Thanks!

yhdang
  • 169
  • 8
  • How long does it have without logging? With logging, it takes a few seconds on my PC and generates a large logfile for about 259e3 instructions (and over 30e6 of lines). I guess the 3 min are because of the disk usage, maybe you could write it to a `tmpfs` (assuming you have enough RAM). – 12431234123412341234123 Jul 04 '23 at 16:06

2 Answers2

3

This is a guess based on source diving: I believe single stepping through the guts of gettimeofday is slowing those guts down so much that they get stuck in an endless loop trying and failing to meet their internal accuracy requirement for the result.

The code for gettimeofday on x86-64/Linux is in linux/lib/vdso/gettimeofday.c. The "guts" I'm talking about are the do_hres function. (I've linked to kernel 5.15.0 because that's what you have, but this file doesn't change often.) Notice the loop in this function. Inside the loop, it's calculating the time as, essentially, (coarse counter maintained by kernel) + (hardware high-precision clock that overflows often). Every time the kernel updates the coarse counter, it also updates the "sequence number" being read in the loop condition. The designer expected this loop to cycle at most twice -- the point is that if you enter the loop at just the wrong moment and read inconsistent data about the coarse counter, thus computing a nonsense time, you'll retry and get it right the second time around.

But stepping through this code instruction by instruction, even in an automated way, makes it so slow that the kernel always changes the coarse counter and the sequence number while the loop body is executing, and so you get stuck in the loop.

(Do not underestimate just how slow it is to step through machine code instruction by instruction, compared to normal execution. You're adding the time cost of two context switches and several commands in GDB's not particularly speedy scripting language to every instruction. I wouldn't be surprised if the context switches and attendant cache and TLB thrashing were enough to trigger this livelock all by themselves.)

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 1
    Thank you very much for your answer. Indeed, stepping instructions as well as logging them will have a very large overhead, and that would take long enough to break some assumptions. Then, my question turns into *how to step instruction through a program, in the presence of `gettimeofday`*. Is that technically possible? – yhdang Jul 05 '23 at 03:43
  • There seems to be [method to avoid stepping in some functions](https://stackoverflow.com/questions/5676241/tell-gdb-to-skip-standard-files), but it does not work for stepping instructions. – yhdang Jul 05 '23 at 04:06
  • 1
    You seem to have found a way to work around your problem, but I would suggest you look at [rr](https://rr-project.org/). – zwol Jul 12 '23 at 12:38
  • This seems like a way more graceful solution -- thanks! – yhdang Jul 13 '23 at 11:39
1

Here is the gdb script I use to bypass this problem:

# load the program binary, symbol table, etc., and start it, requiring gdb > 8.1
starti
# print location for every instruction
display/i $pc
# tell gdb to output without caring window height
set height 0

while 1
    # run the functions to completion if they cause problems
    if ($pc == gettimeofday)
        fin
    end
    if ($pc == clock_gettime)
        fin
    end
    # optional:
    # newer x86 version of `memset` uses `rep` instructions
    # which will expand to a whole lot of lines. So ignore it too.
    if ($pc == memset)
        fin
    end
    # most importantly, step instruction on default
    si
end

This does not work correctly for all cases (e.g., branching on time-related events) and is obviously slow, but it does the job for me.

yhdang
  • 169
  • 8