GDB hangs during remote debugging, library version mismatches

Question

I'm using linux and am trying to remote debug a program.

I launch gdbserver on the target, from .xinitrc with

gdbserver localhost:9134 /root/game/game

On my local pc, which I'm running inside a virtualbox vm, I connect to the target from gdb with

target remote 192.168.1.20:9134

and it connects fine. I can set a breakpoint at main with

b main

and then I can continue and it will break there. I can single step for a ways until it gets to the call SDL_Init(), from which it will never return back to gdb. If I don't single step to SDL_Init but instead set a breakpoint further on in the program, the program will start up and run normally (so it gets past SDL_Init). But when it reaches the breakpoint, it freezes up on the target machine and gdb on my local machine never shows a prompt. The entire thing hangs and must be restarted. It's not completely frozen, however, as the mouse pointer still moves on the target and you can ping it, but the gdb connection no longer works. So it seems that the graphics systems somehow interferes with this since breakpoints before the graphics system init do work, but not after.

I've tried setting the remotetimeout setting to 500 seconds and it exhibits the same behaviour. When I ping the remote target from my local pc the reported time is around 0.3 to 0.4 ms. So that doesn't seem out of the ordinary, but I wouldn't rule out any other misconfigured network settings on my part.

It's on a legacy system (but hey, it still makes money) with gdbserver version 6.8-19.fc10 and gdb version 6.8-29.fc10. Upgrading versions, while a very large headache, could be possible but probably should not be necessary (any upgrades I make to my pc have to also be done to a state regulator's system, as they use the gdb setup for their testing purposes, but it's not impossible). Remote debugging was working in the past before I took over the project, and no one who worked on it before is still around to ask. The gdbserver version definitely worked, as I'm using the exact program used previously.

Update 1:
I updated the gdb version on the host machine to version 7.0.1 and it still exhibits the same behavior. I couldn't do version 8 as it needs a C++11 compiler and the legacy system is before that time.

Update 2:
I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.

Update 3:
I dug out a serial cable and finally got the remote debugging setup via serial. It still doesn't work, but it gives me more error messages. I get the error

gdbserver: error initializing thread_db library: version mismatch between libthread_db and libpthread

which I think makes sense since my breakpoints quit working after the graphics system is initialized which involves creating some threads. After googling that error, I've tried using set solib-absolute-prefix, set solib-search-path, and set sysroot to the root directory on the host machine of a copy of the filesystem on the target machine (on the host, that is /fw_dev/fgs/cf/initrd/expand, which contains the filesystem that the initrd is made from)
But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error. I've also tried setting those variables to the lib subdirectory, which doesn't work either. I also tried just copying the local thread libraries from the host's /lib directory to the /lib on the target, but then x windows won't even start.

Update #4:
I tried launching gdb from the root of the copy of the target filesystem on the host (/fw_dev/fgs/cf/initrd/expand), and gdb still hangs on breakpoints but I no longer get the error message about libthread_db and libpthread mismatches, so back to the drawing board.

Update #5
Maybe I'm getting to where I should ask this a separate question, but I compiled gdb, then ran gbd on itself. Then used file to set it to the program on the host, set the remote target, set my breakpoints and then ran continue. When I get to the breakpoint, gdb hangs as always. But now when I press ctrl-c in gdb, I get this backtrace

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00b3f39d in ___newselect_nocancel () from /lib/libc.so.6
#2  0x08203b9a in ser_base_wait_for (scb=0x96a2930, timeout=1) at ser-base.c:206
#3  0x08203c89 in do_ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:256
#4  0x08204046 in generic_readchar (scb=0x96a2930, timeout=-1, do_readchar=0x8203c60 <do_ser_base_readchar>) at ser-base.c:326
#5  0x082040b0 in ser_base_readchar (scb=0x96a2930, timeout=-1) at ser-base.c:391
#6  0x081ecac2 in serial_readchar (scb=0x96a2930, timeout=-1) at serial.c:376
#7  0x080c4357 in readchar (timeout=<value optimized out>) at remote.c:5922
#8  0x080c5e35 in getpkt_or_notif_sane_1 (buf=0x839f140, sizeof_buf=0x839f144, forever=1, expecting_notif=0) at remote.c:6440
#9  0x080d1e0a in getpkt_sane (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:6534
#10 remote_wait_as (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4736
#11 remote_wait (ops=0x839f180, ptid=..., status=0xbffff388, options=0) at remote.c:4843
#12 0x08184d4b in target_wait (ptid=..., status=0xbffff388, options=0) at target.c:2098
#13 0x0815daf2 in wait_for_inferior (treat_exec_as_sigtrap=0) at infrun.c:2028
#14 0x0815ddd4 in proceed (addr=4294967295, siggnal=TARGET_SIGNAL_DEFAULT, step=0) at infrun.c:1652
#15 0x08153729 in continue_1 (all_threads=0) at infcmd.c:668
#16 0x08153ea2 in continue_command (args=0x0, from_tty=0) at infcmd.c:760
#17 0x0808e9e8 in execute_command (p=0x83b89a1 "", from_tty=0) at top.c:453
#18 0x0816b028 in command_handler (command=0x83b89a0 "c") at event-top.c:511
#19 0x0816bd5a in command_line_handler (rl=0x8ce83e8 "\340&\266\b\340\230\321\b") at event-top.c:736
#20 0x0822d5a5 in rl_callback_read_char () at callback.c:205
#21 0x0816b17b in rl_callback_read_char_wrapper (client_data=0x0) at event-top.c:178
#22 0x0816ac54 in handle_file_event (data=...) at event-loop.c:812
#23 0x08169e6b in process_event () at event-loop.c:394
#24 0x0816aba4 in gdb_do_one_event (data=0x0) at event-loop.c:459
#25 0x0816500b in catch_errors (func=0x816a950 <gdb_do_one_event>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#26 0x080f072a in tui_command_loop (data=0x0) at ./tui/tui-interp.c:153
#27 0x08165684 in current_interp_command_loop () at interps.c:291
#28 0x0808653b in captured_command_loop (data=0x0) at ./main.c:226
#29 0x0816500b in catch_errors (func=0x8086530 <captured_command_loop>, func_args=0x0, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#30 0x08085ecc in captured_main (data=0xbffff7a4) at ./main.c:902
#31 0x0816500b in catch_errors (func=0x80853d0 <captured_main>, func_args=0xbffff7a4, errstring=0x82ccc3d "", mask=6) at exceptions.c:510
#32 0x080851d1 in gdb_main (args=0xbffff7a4) at ./main.c:911
#33 0x08085195 in main (argc=128, argv=0x0) at gdb.c:33

So it seems gdb is hanging inside __kernel_vsyscall(). Doing a diff on libc.so.6 on in the /lib directory on the host and the libc.so.6 on the target reveal differences. I've tried using LD_PRELOAD and LD_LIBRARY_PATH but that backtrace always shows /lib/libc.so.6 instead of pointing to the copy that the target has. Maybe I'm not setting them correctly, but I've tried setting them in gdb with set env and also setting them on the command line and exporting them, but to no effect. I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc. So how do I get gdb to load different libraries?

Update #6:
So I made a bootable usb key using the target system's disk image as the base. I made minimal changes to it to get it to run on a standard PC, and added gdb and gdb's requisite libraries to it. So now, ibc is the same on both host and target machines and it still hangs on me.

Final. While I know gdb 6.8 worked in the past, I can't figure out the configuration. After upgrading both gdb and gdbserver to 7.12 it worked.

What Linux distribution are you using? Also, what kernel version are you running? — oxr463, Jul 09 '19 at 16:20
Kernel 2.6.27.5-117.fc10.i686 on the target. It's derived from Fedora, but it's really stripped down and uses BusyBox. It's sort of a big embedded system, although it uses a more-or-less normal motherboard but with added hardware (GPIO's, NVRAM, battery backed-up intrusion detection, etc). Host machine uses 32-bit Fedora 10 with the same kernel. — jhufford, Jul 09 '19 at 17:55
How about `dmesg`? Anything interesting in there? Also, how much RAM does the system have? — oxr463, Jul 09 '19 at 19:18
After it hangs, I can't run dmesg on the target because the gui's been launched, and I can't kill it with the keyboard. I tried to put sshd on it, thinking I could remote into the target after it hangs, but it's more library issues and it segfaults. — jhufford, Jul 09 '19 at 20:38
Can you launch it without the GUI, and then manually start it? — oxr463, Jul 09 '19 at 23:41
@LucasRamage I'm not sure what you mean. But I do have the target system starting up without a gui and then launch the target program with gdbserver from the .xinitrc file. — jhufford, Jul 10 '19 at 01:30

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

Upgrading versions, while a very large headache, could be possible but probably should not be necessary...

This is the right option. All of the other issues you are encountering are because of this.

I've tried this in another virtual machine and I even built a fresh dedicated linux install (so no vm), rebuilt the software, and I get the same behavior. So it appears the issue is probably on the target machine's configuration.

You should build on the same version, architecture, etc. as the system you are attempting to deploy your code to.

But then when I try to set breakpoints, I get Error accessing memory address 0xb5eb60: Input/output error.

Per this answer,

Can be caused by 32/64 bit mixups. Check, for example, that you didn't attach to a 32-bit binary with a 64-bit process' ID, or vice versa.

I also tried putting the libc from the host computer onto the target machine, and it won't even boot, it gets a segfault in libc.

Don't do that. As you've found out, it won't work.

So how do I get gdb to load different libraries?

Per this question, you can use LD_LIBRARY_PATH.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 09 '19 at 16:34

oxr463

1,573
3
14
34

In Update #1 I upgraded gdb as far as I could, from version 6 to 7 and still have the same issue. I can't do version 8 due to it being a legacy system with no C++11 compiler. Early on in the project I tried upgrading gcc to the latest but that immediately blew up in my face. – jhufford Jul 09 '19 at 17:59
I don't think it can be a 32/64 bit mixup, both host and target are 32-bit systems. The host is running Fedora 10, and the target is running the same kernel, but stripped down using BusyBox. No desktop or anything like that. Libc is the same name and version number on both system, but they aren't identical, hence the failed attempt to copy them, I'm unsure why they're different, apparently compiled differently. I tried LD_LIBRARY_PATH but I could never get line #1 in the backtrace in update #5 to change. So unless I didn't get it exactly right, but I followed examples I found online. – jhufford Jul 09 '19 at 18:06
I'm giving you the bounty. You did say upgrading was the answer. I did get it working with an upgraded version of gdb. The key was that I forget to also upgrade gdbserver... I know the old version did work at one time, we have dozens of gdb scripts and existing eeproms with the older gdbserver on it, so there's some config issue somewhere. But whatever, results are results. I'll just have to upgrade the gdb version on the state regulators build PC... – jhufford Jul 11 '19 at 04:10
Well crud... It says bounty is expired... I just now got it working. So sorry about that.. In any case, I do appreciate the help. – jhufford Jul 11 '19 at 04:12
No worries, happy to help. This is probably off-topic, but how are you managing these devices, i.e., how is software deployed to them in a production environment? – oxr463 Jul 11 '19 at 12:13
1

It's for poker/keno machines. There's no network. So to deploy, a tech physically puts in new media. A software set is a CF card, a bios chip that has an optionrom that hashes the executable parts of the CF card and an eprom chip that has the hash to compare to. There's another socket on the motherboard that if an eprom is present & gdbserver is on that chip, the game software launches with gdbserver. The state regulators use that so they can test out certain scenarios before approving the software. Luckily the software has a PC mode, so I could do all development locally. – jhufford Jul 11 '19 at 17:52

score 0 · Answer 2 · answered Jul 09 '19 at 13:48

0

Here are some interesting suggestions. Have you tried to attach gdbserver to strace to see what kind of activity is going on during the hang? As other says - it could be a good way to go one step further into figuring out the problem. You can do that with following command on target machine:

strace -p `pidof gdbserver`

Also sending a CONT signal to gdbserver may help when it hangs:

kill -CONT `pidof gdbserver`

answered Jul 09 '19 at 13:48

Ivan Kotov

23
3

The `kill -CONT` does nothing. I'm not sure what I'm looking at with `strace` a bunch of `select`, `recv` and `send`. When the breakpoint is hit and everything hangs, nothing interesting is coming from strace, just `select(7, [6], NULL, [6], {1,0}) = 0 (Timeout)` is displayed repeatedly. – jhufford Jul 09 '19 at 20:11

GDB hangs during remote debugging, library version mismatches

2 Answers2