How to retain a stacktrace when Cortex-M3 gone in hardfault?

Question

Using the following setup:

Cortex-M3 based µC
gcc-arm cross toolchain
using C and C++
FreeRtos 7.5.3
Eclipse Luna
Segger Jlink with JLinkGDBServer
Code Confidence FreeRtos debug plugin

Using JLinkGDBServer and eclipse as debug frontend, I always have a nice stacktrace when stepping through my code. When using the Code Confidence freertos tools (eclipse plugin), I also see the stacktraces of all threads which are currently not running (without that plugin, I see just the stacktrace of the active thread). So far so good.

But now, when my application fall into a hardfault, the stacktrace is lost. Well, I know the technique on how to find out the code address which causes the hardfault (as seen here). But this is very poor information compared to full stacktrace.

Ok, some times when falling into hardfault there is no way to retain a stacktrace, e.g. when the stack is corrupted by the faulty code. But if the stack is healty, I think that getting a stacktrace might be possible (isn't it?).

I think the reason for loosing the stacktrace when in hardfault is, that the stackpointer would be swiched from PSP to MSP automatically by the Cortex-M3 architecture. One idea is now, to (maybe) set the MSP to the previous PSP value (and maybe have to do some additional stack preperation?).

Any suggestions on how to do that or other approaches to retain a stacktrace when in hardfault?

Edit 2015-07-07, added more details.

I uses this code to provocate a hardfault:

__attribute__((optimize("O0"))) static void checkHardfault() {
    volatile uint32_t* varAtOddAddress = (uint32_t*)-1;
    (*varAtOddAddress)++;
}

When stepping into checkHardfault(), my stacktrace looks good like this:

gdb-> backtrace
#0  checkHardfault () at Main.cxx:179
#1  0x100360f6 in GetOneEvent () at Main.cxx:185
#2  0x1003604e in executeMainLoop () at Main.cxx:121
#3  0x1001783a in vMainTask (pvParameters=0x0) at Main.cxx:408
#4  0x00000000 in ?? ()

When run into the hardfault (at (*varAtOddAddress)++;) and find myself inside of the HardFault_Handler(), the stacktrace is:

gdb-> backtrace
#0  HardFault_Handler () at Hardfault.c:312
#1  <signal handler called>
#2  0x10015f36 in prvPortStartFirstTask () at freertos/portable/GCC/ARM_CM3/port.c:224
#3  0x10015fd6 in xPortStartScheduler () at freertos/portable/GCC/ARM_CM3/port.c:301
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

See a couple of answers that I gave on similar questions, [here](http://stackoverflow.com/a/21149143/1382251) and [here](http://stackoverflow.com/a/22423647/1382251) (this one is a little more detailed, as it refers to a specific issue outside the scope of your question). — barak manos, Jul 06 '15 at 21:04
If memory serves correctly, then PC and LR store the addresses of the last two functions in the call-stack before the interrupt has occurred, and R0 thru R3 store the arguments passed to these functions. — barak manos, Jul 06 '15 at 21:08
Your suggested solution looks like the same as described on [freertos.org](http://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html) (as I also mentioned in my question). It just leaves a hint to the `PC` which finally causes the hardfault (and also one calling level more within `LR` as I now learned from your comment) but it won't provide a stacktrace. — Joe, Jul 06 '15 at 21:59
@Joe were you able to make progress on this? As it turns out, I'm in a similar situation — bytefire, Jan 19 '16 at 09:29
@bytefire: In meanwhile, I have perfectly working hardfault handler which is written by another company. Unfortunately, I'm not allowed to publish the code here, because its not open source. Sorry about that. — Joe, Jan 19 '16 at 13:48
Yep, it provides the stack trace down to the freertos scheduler (if the stack wasn't corrupt itself). As far as I know, the current version of code confidence eclipse pluging has now some kind of this functionality and the developer named it [Exception Handling](http://www.codeconfidence.com/doc/ecos/current/ref/kernel-exceptions.html), but I'd never used it until now. — Joe, Jan 19 '16 at 21:45

score 1 · Answer 1 · answered Sep 06 '15 at 05:00

The quickest way to get the debugger to give you the details of the state prior to the hard fault is to return the processor to the state prior to the hard fault.

In the debugger, write a script that takes the information from the various hardware registers and restore PC, LR, R0-R14 to the state just prior to causing the hard fault, then do your stack dump.

Of course, this isn't always helpful when you end up at the hard fault because of popping stuff off of a blown stack or stomping on stuff in memory. You generally tend to corrupt a bunch of the important registers, return back to some crazy spot in memory, and then execute whatever's there. You can end up hard faulting many thousands (millions?) of cycles after your real problem happens.

score 0 · Answer 2 · answered Dec 04 '19 at 23:49

Consider using the following gdb macro to restore the register contents:

define hfstack
    set $frame_ptr = (unsigned *)$sp
    if $lr & 0x10
        set $sp = $frame_ptr + (8 * 4)
    else
        set $sp = $frame_ptr + (26 * 4)
    end
    set $lr = $frame_ptr[5]
    set $pc = $frame_ptr[6]
    bt
end

document hfstack
set the correct stack context after a hard fault on Cortex M
end

How to retain a stacktrace when Cortex-M3 gone in hardfault?

2 Answers2

Linked