1

I am currently trying to debug a simple instruction sequence in ARM assembly. However, trying to single-step debug (with arm-none-eabi-gdb) causes the debugger to freeze always at one specific instruction and forces me to kill the debugging session. The debugger does not allow me to suspend the session to check if I entered a Fault Handler, so only terminating is possible.

Some environment information

  • IDE: Eclipse
  • arm-none-eabi-gcc compiler
  • compiler flags: -mcpu=cortex-m4 -mthumb -O0 -g3 -ffunction-sections -c
  • running on a stm32f4 discovery board with a M4

The Problem

From my main function I call foo, which is a function I wrote in assembly. After pushing the non-scratch registers and LR onto the stack and making space for local variables etc. I call inside this assembly function another assembly function bar and save the scratch register on the stack.

My simplified assembly file looks like this:

.syntax unified
.thumb
.text
.global foo
.global bar

bar:
push {r4,r5,r6,r7,r14} //the debugger "freezes" before executing this instruction
add r4,r5
pop {r4,r5,r6,r7,r15}

foo:
push {r4,r5,r6,r7,r14}
subs r13, r13, #124
.foo_bb1:
push {r0}
push {r1}
push {r2}
push {r3}
bl bar
pop {r4,r5,r6,r7,r15}

My header file looks like this

#ifndef FUNCTIONS_H_
#define FUNCTIONS_H_

#include "stdint.h"
void foo();
void bar();
#endif

and my main.c file looks like the following

int main()
{
    foo();
    return 0;
}

When I attach the debugger and go into foo everything works fine right after I single step the bl bar instruction. After this instruction the debugger correctly jumps to bar but now the debugger freezes (i.e. before executing the push instruction in bar) and I can not do anything else except terminating the session. This also means I can not look at the content of the registers anymore or inspect the memory layout.

What I tried

The first thing that came into my mind is checking if the stack pointer is correctly aligned. The stack pointer is always 8-byte aligned. This is the case at the beginning of foo, after subs r13, r13, #124 and before taking the jump instruction. Surprisingly for me, if I remove all the push {rX} instructions the debugger proceeds without a problem. If I add even one push instruction, the problem comes up again. I can not get rid of storing r0-r3 completely so just removing the push instructions is not really a viable option. I can switch the function call to other functions than bar, but that also does not change the behavior.

Edit: While creating this minimal example I discovered that there is a label before those push {rX} instructions in foo. Removing the label .foo_bb1: and keeping the push leads to no crash as well. Unfortunately, I also need the label and can not completely get rid of it.

I tried to see if I run into a Fault Handler but it does not seem to be the case. I also tried to disable interrupts with cpsid i, but it did not change the behavior. I looked at the stack layout before jumping to see if anything is corrupted, but I could not find any strange looking data. I tried to reduce the local stack region to see if the stack is full by subtracting a smaller number from r13 but without any success.

There is a somewhat related problem regarding single step debugging asked in the st community forum. This tackles the ARM M7 and STMf7 bord, so also not really suitable for my problem. Nevertheless, I also checked the workaround there but it did not work either.

At this point I do not know what kind of error this is anymore. Is this a STM problem, is this a debugger problem, is this an implementation error? If anyone had a similar issue or knows how to tackle this problem I would very much appreciate it :)

pota_toe
  • 11
  • 3
  • Does adding `.cpu cortex-m4` to your assembler code make any difference? Probably not, but worth a try. – pmacfarlane May 22 '23 at 16:23
  • Unfortunately not, but thanks for the try :) – pota_toe May 22 '23 at 16:31
  • not enough info yet, need a complete example, etc. – old_timer May 23 '23 at 00:06
  • I extended the example with the corresponding c and header file. Furthermore I edited another observation during the creation of the example – pota_toe May 23 '23 at 06:24
  • cpsid i doesn't disable processor exceptions, only regular interrupts. cpsid f disables exceptions. Check the exact value you're branching to (bl bar), to make sure it has the LSB set. Disassemble the code, check how bl bar works. I mean, branch and link is supposed to have the target value somewhere, or there could be a relative jump? Not with this instruction tho, I think, bl works with jump target address in a register. Maybe you should do `mov r0, bar`, `bl r0` – Ilya May 23 '23 at 10:40
  • do you have a disassembly of this since the example is not complete enough to understand/see the problem. – old_timer May 23 '23 at 14:37
  • looks like it is your software and not stm, etc. but need a complete example, these fragments can be used in a way that will insure failure, depending on the missing pieces. – old_timer May 23 '23 at 14:41
  • using the debugger could be the problem too and that would be the debugger tool of which there are many gdb is only one part of the equation and eclipse is just a wrapper of everything, the debug hardware, the firmware version on that hardware, how that hardware is used, is this using swd or a rom monitor or a combination, etc...version of gdb, who built it and how, etc all are relevant factors if this is a debug issue. depending on the debugger can likely telnet into openocd and bypass gdb all together and see what you see when stopping and/or stepping – old_timer May 23 '23 at 14:47

1 Answers1

0

Well... I think I will get roasted for this one:

As my edit already said, after I removed the label .foo_bb1:, the debugger proceeded without errors. There was a related question here. I did not think that this had anything todo with labeling or label names.

In short: Using capital 'L' at the start of the label name, i.e., .Lfoo_bb1: solves the issue.

pota_toe
  • 11
  • 3
  • `.Lfoo_bb1:` labels are "local" (file-scope) and don't get put in the symbol table at all. But unless you use `.globl .foo_bb1`, it's also file scope. But will go in the symbol table of the `.o`, and maybe end up somewhere the debugger can see it, if that explains anything. (I didn't try to read and understand the whole question.) https://sourceware.org/binutils/docs/as/L.html / https://sourceware.org/binutils/docs/as/Symbol-Names.html – Peter Cordes May 23 '23 at 06:54
  • 1
    The `bar` makes no sense. You do not need to store anything on the stack. Also the final line in the question is a `push` and not `pop`. [Arm link and frame registers](https://stackoverflow.com/questions/15752188/arm-link-register-and-frame-pointer). Since you subtracted a large value from the stack, even with `pop`, the layout is incorrect and you return to garbage. The debugger is trying to set temporary hardware breakpoints but can not understand what is going on. more [info here](https://stackoverflow.com/questions/57451208/whats-the-structure-of-arm-extab-entry-in-armcc) – artless noise May 23 '23 at 22:29
  • 1
    For instance, `.cantunwind` may help here... to make the debugger work. – artless noise May 23 '23 at 22:30
  • `bar` is just a minimal example. In my real code a lot more happens so I need all of the registers I pushed after the function call. I changed the final line to `pop`. This was just a typo. Thanks for correcting. I do not see why subtracting the value from the stack should be harmful. I compiled some C code and looked at it in compiler explorer. The same value gets subtracted there and everything works perfectly fine when debugging. – pota_toe May 24 '23 at 06:41
  • @pota_toe: If you build the code in the question now into an executable, can you actually reproduce the problem you're talking about? If not, it's not a [mcve], and you should roll it back to the earlier code that did have a bug. Or if that was also fake code that didn't actually do what your question was asking about because you never even ran it, that's a waste of everyone's time. – Peter Cordes May 24 '23 at 06:47
  • Making an actual MCVE will often narrow down the possible bug, and sometimes lead you to spot some silly mistake you didn't realize existed (solving your own problem). But if you do end up posting a question, it's critically important that the test-case be real, especially when you don't understand why something weird happens. – Peter Cordes May 24 '23 at 06:49
  • Of course I made sure that the problem is still reproducable and I also checked it. I just wanted to create a **minimal** example because the original code is way bigger and hard to overlook. So the test-case is real, minimal and reproducable in the sense that it 1) reproduces the behavior I encountered 2) includes all instructions that were executed before running into the problem but 3) does not include instructions that would be executed **after** the debugger fails to ensure a compact overview – pota_toe May 24 '23 at 06:59