1

I have two identical raspberry pi 3 b+ devices, one running with raspberrypi-kernel_1.20180313-1 and the other running raspberrypi-kernel_1.20180417-1. I'm watching bluetooth events using "hcidump -R" on both devices. One device shows bluetooth events, the other does not. I've swapped the SD cards on the devices to confirm it is not related to hardware, regardless of which device the SD card is in, the one running 20180313 shows the bluetooth events and the 20180417 does not.

To debug this, I've been adding some printk statements to various points in the raspbian source pulled from git: https://github.com/raspberrypi/linux

I felt the most relevant place to start with the debugging was in the bluetooth RX code, e.g. print something for each bluetooth message send to the line discipline. Specifically, in drivers/bluetooth/hci_ldisc.c, in the hci_uart_tty_receive function, I added two lines to the beginning:

printk(KERN_ERR "AARON: In hci_uart_tty_receive with tty %p\n", tty);
dump_stack();

After rebuilding the kernels and starting the devices, on the pi running 20180313 I saw the log messages/stack traces and on the other pi I saw nothing, indicating the bluetooth RX code isn't being reached. So to further debug, I looked at the stack trace, which was:

Jul  9 21:03:18 tiltpi kernel: [    9.391137] Workqueue: events_unbound flush_to_ldisc
Jul  9 21:03:18 tiltpi kernel: [    9.391166] [<8010f664>] (unwind_backtrace) from [<8010bd1c>] (show_stack+0x20/0x24)
Jul  9 21:03:18 tiltpi kernel: [    9.391183] [<8010bd1c>] (show_stack) from [<80449c20>] (dump_stack+0xc8/0x114)
Jul  9 21:03:18 tiltpi kernel: [    9.391221] [<80449c20>] (dump_stack) from [<7f4400cc>] (hci_uart_tty_receive+0x5c/0xac [hci_uart])
Jul  9 21:03:18 tiltpi kernel: [    9.391254] [<7f4400cc>] (hci_uart_tty_receive [hci_uart]) from [<804b9bdc>] (tty_ldisc_receive_buf+0x64/0x6c)
Jul  9 21:03:18 tiltpi kernel: [    9.391273] [<804b9bdc>] (tty_ldisc_receive_buf) from [<804ba15c>] (flush_to_ldisc+0xcc/0xe4)
Jul  9 21:03:18 tiltpi kernel: [    9.391293] [<804ba15c>] (flush_to_ldisc) from [<80135934>] (process_one_work+0x144/0x438)
Jul  9 21:03:18 tiltpi kernel: [    9.391311] [<80135934>] (process_one_work) from [<80135c68>] (worker_thread+0x40/0x574)
Jul  9 21:03:18 tiltpi kernel: [    9.391328] [<80135c68>] (worker_thread) from [<8013b930>] (kthread+0x108/0x124)
Jul  9 21:03:18 tiltpi kernel: [    9.391352] [<8013b930>] (kthread) from [<80107ed4>] (ret_from_fork+0x14/0x20)

I proceeded to add printk statements for flush_to_ldisc and tty_ldisc_receive_buf, recompile, and retest. However, while I continued to see the printk message I added in hci_uart_tty_receive, I did not see the messages I added to flush_to_ldisc or tty_ldisc_receive_buf.

Upon further inspection of the kernel source, I found the stack trace didn't even make sense as the functions listed did not directly call to eachother. More specifically, in tty_buffer.c, flush_to_ldisc (towards the bottom of the stack) calls to receive_buf, which then calls to tty_ldisc_receive_buf, which will then call to hci_uart_tty_receive in hci_ldisc.c. The kernel stack doesn't have any entry for receive_buf and shows flush_to_ldisc calling directly to _tty_ldisc_receive_buf.

So I'm quite confused. I've searched through the kernel source and found no other declarations of "flush_to_ldisc" or "tty_ldisc_receive_buf" functions.

Why/how can dump_stack() be missing a stack entry? Why aren't the prink statements I've placed in the functions toward the bottom of that stack showing up while the printk statements I've placed toward the top of the stack do show up?

EDIT:

A bit more searching shows that the Linux kernel relies on gcc to do certain optimizations including automatic inlining of some functions, and hence that is likely what is happening to my stack trace. That would explain why I don't see the functions explicitly listed in the stack, but doesn't explain why the printk output doesn't show up. Any thoughts from anyone on why printk statements would show up from functions in the top of the stack but not the bottom? The rsyslog.conf file is setup with:

*.err                           -/var/log/errors.log

And all printk statements I added are like "printk(KERN_ERR "string\n");"

EDIT2: Updated question title to reflect that it is not just about absent printk output.

EDIT3: I deleted my local copy of the kernel source, pulled it again, added my printk statements, and recompiled from scratch, and I now get the printk statements showing up. It seems the code I added wasn't recompiled or linked into the kernel build. I ran "make clean" before making the kernel, but still it seems something wasn't getting compiled/linked properly. But starting clean resolved the problem.

Summary: Linux kernel makes use of gcc optimizations that will result in functions being compiled inline even when not explicitly specified in the source as inline. And when you're "sure" you've recompiled the kernel with your changes, you should start over with a clean source/build dir and try a second time before taking your issue to stack.

AaronBeers
  • 43
  • 5
  • `receive_buf` could have been inlined and it's difficult to say why some printk aren't showing up without seeing the full code. Anyway I think that the best way to find your bug would be to [bisect](https://stackoverflow.com/a/4714297) the git repository. – smeso Jul 10 '18 at 09:48
  • No explicit inline declarations in the relevant source. Even if it was inline'd I'd still expect to see the printk statement output. The printk statements are just standard printk(KERN_ERR "string\n"); and I've setup rsyslog to capture KERN_ERR messages. Appreciate the bisect suggestion, might go that route. Function signatures in drivers/tty/tty_buffer.c are: static void flush_to_ldisc(struct work_struct *work); static int receive_buf(struct tty_ldisc *ld, struct tty_buffer *head, int count); int tty_ldisc_receive_buf(struct tty_ldisc *ld, unsigned char *p, char *f, int count); – AaronBeers Jul 10 '18 at 13:32
  • *"And when you're "sure" you've recompiled the kernel with your changes, you should start over..."* -- That's commendable but a bit extreme. You could first verify that your printk() text strings exist in the uncompressed **arch/arm/boot/Image** file. Did you also verify that the .config files were identical? Also check the build time of the kernel after booting to verify that you used the new kernel image. – sawdust Jul 10 '18 at 23:01

0 Answers0