1

Working on a STM32G0B0 (Cortex M0+), recently I have a problem with IWDG (independent watch dog).
Despite I never made it work properly as windowed one, it works decently as normal watchdog.
I set it quite hard at 10mS just to observe any glitches during development.
Never triggered, code works properly until today :))))

So I would like to know if my code execution was the problem (hard to believe)
Or just a bug hit me and landed to HarwareFault()
Finally, could be a non implemented vector, but shouldn't be unless a bug occured.

Similar to this thread
How do I debug unexpected resets in a STM32 device?

And this implementation for Cortex M3, M4
https://blog.frankvh.com/2011/12/07/cortex-m3-m4-hard-fault-handler/

I would like to know if there is a good way for M0+ to save at least address that caused jumping to HardwareFault.

Would be nice to save more for debug purposes. I want to print that info after next reset.

Thanks for hints!

Note: I use bare metal C without any SDK except for definitions and ARM low level code.

EDIT
Following some guidance from here
https://community.silabs.com/s/article/debug-a-hardfault?language=en_US

There is a handler that they say is working on M0 but it isn't

void debugHardfault(uint32_t *sp)
{
    uint32_t r0  = sp[0];
    uint32_t r1  = sp[1];
    uint32_t r2  = sp[2];
    uint32_t r3  = sp[3];
    uint32_t r12 = sp[4];
    uint32_t lr  = sp[5];
    uint32_t pc  = sp[6];
    uint32_t psr = sp[7];
    while(1);
}

__attribute__( (naked) )
void HardFault_Handler(void)
{
    __asm volatile
    (
        "mrs r0, msp                                   \n"
        "mov r1, #4                                    \n"
        "mov r2, lr                                    \n"
        "tst r2, r1                                    \n"
        "beq jump_debugHardfault                       \n"
        "mrs r0, psp                                   \n"
        "jump_debugHardfault:                          \n"
        "ldr r1, debugHardfault_address                \n"
        "bx r1                                         \n"
        "debugHardfault_address: .word debugHardfault  \n"
    );
}


Error: selected processor does not support `mrs r0,msp' in Thumb mode

EDIT2 Found a handler for M0 at Segger
https://wiki.segger.com/Cortex-M_Fault

Implemented like this for M0

    .syntax unified
    .cpu cortex-m0plus
    .fpu softvfp
    .thumb

    .global HardFault_Handler
    .global NMI_Handler
    .global PendSV_Handler
    .global SVC_Handler


 HardFault_Handler:
 BusFault_Handler:
 UsageFault_Handler:
 MemManage_Handler:
 PendSV_Handler:
 SVC_Handler:
 NMI_Handler:

         ;// This version is for Cortex M0
         movs   R0, #4
         mov    R1, LR
         tst    R0, R1            ;// Check EXC_RETURN in Link register bit 2.
         bne    Uses_PSP
         mrs    R0, MSP           ;// Stacking was using MSP.
         b      Pass_StackPtr
 Uses_PSP:
         mrs    R0, PSP           ;// Stacking was using PSP.
 Pass_StackPtr:
         ldr    R2,=HardFaultHandler
         bx     R2                ;// Stack pointer passed through R0. 

         .end

IWDG (watchdog) was disabled and I triggered a HF manually like this

int _UnalignedAccess(void) {
  int r;
  volatile unsigned int* p;

  p = (unsigned int*)0x20000001; // not aligned
    r = *p;
  return r;
}

Collecting function

void HardFaultHandler(unsigned int* pStack) {

    HardFaultRegs.SavedRegs.r0 = pStack[0];  // Register R0
    HardFaultRegs.SavedRegs.r1 = pStack[1];  // Register R1
    HardFaultRegs.SavedRegs.r2 = pStack[2];  // Register R2
    HardFaultRegs.SavedRegs.r3 = pStack[3];  // Register R3
    HardFaultRegs.SavedRegs.r12 = pStack[4];  // Register R12
    HardFaultRegs.SavedRegs.lr = pStack[5];  // Link register LR
    HardFaultRegs.SavedRegs.pc = pStack[6];  // Program counter PC
    HardFaultRegs.SavedRegs.psr.byte = pStack[7];  // Program status word PSR
}

Still not working properly. Hardware fault is triggered but my function is not called at all. Instead, big crash and reset (no watchdog)

Any help appreciated!

artless noise
  • 21,212
  • 6
  • 68
  • 105
yo3hcv
  • 1,531
  • 2
  • 17
  • 27
  • One option would be to disable the IWDG, and use another timer that wraps around after 10ms (and generates an interrupt). Change your "reset the watchdog" to "reset the timer". Then put a breakpoint in the ISR for the timer. – pmacfarlane Apr 20 '23 at 20:26
  • It might also be worth checking what the reset cause was. The information is in the `RCC_CSR` register. There are HAL macros like `__HAL_RCC_GET_FLAG()` to abstract that. Most of my projects start by logging the reset cause on a UART, so I can detect flaky behaviour from the start. – pmacfarlane Apr 20 '23 at 20:30
  • Instead of searching on net install STM32CubeIDE and it has fault analyzer thich does all the hard job for you collecting the data from the stack and registers – 0___________ Apr 20 '23 at 21:01
  • @pmacfarlane Good point! IWDG doesn't have a handler but WWDG does! I will change to that one! – yo3hcv Apr 20 '23 at 21:18
  • @0_________ Very good point, thanks! I already work in their IDE. The problem is that I want to do this on runtime – yo3hcv Apr 20 '23 at 21:51

2 Answers2

1

This doesn't solve my initial log at runtime problem, I leave it open if someone have a good reply related to hardware fault saving.

But, thanks to @0___________ I found the bug & solved, using STM32's IDE "Fault Analyzer."

Here is what I found, pointer redirection to some junk structure due to loading from other memory than expected.

/* Initially I had this
   But turned out that DRV.Tx.Cmd itself was NULL at some point
   So ->OnFailCmd happened to be valid, taken from address 0 + some struct offset 
   which was somewhere in Main Flash Memory
   
*/
if (DRV.Tx.Cmd->OnFailCmd) { // wrong way
    DRV.Tx.Cmd = DRV.Tx.Cmd->OnFailCmd; // wrongly loaded with junk, etc...
}


if (DRV.Tx.Cmd && DRV.Tx.Cmd->OnFailCmd) { // correct way
    DRV.Tx.Cmd = DRV.Tx.Cmd->OnFailCmd; 
}

Now the question is at runtime, this will be extremely hard to catch. Execution loads a NULL struct, but members will still point to some junk due to MCU memory ( 0 + offset).

The real crash is unexpected:

  • some jump to invalid address
  • some load from invalid (missaligned) address (my case)

I need some kind of 2..3 levels log of PC jump.

yo3hcv
  • 1,531
  • 2
  • 17
  • 27
0

I'm in the process of creating a general purpose fault logger on the ARM M0 (TLE984x). I'm a total noob on the ARM so it's been an adventure. I'm trying to use the same basic idea I previously used on an MC98S12Z. The basic idea should work; I now have to figure out the stack to retrieve the ISR/NMI return address.

The way I did it previously was to set aside a small area of RAM that is uninitialized at boot, referred to as KAM (Keep-Alive Memory). For any fatal exception (hard fault or fatal application exception), the handler would capture the NMI/ISR return address and/or other info pertinent to the type of exception and store it in KAM, then force a watch dog reset.

In the following boot, the init code examines KAM for validity (CRC-16). If invalid, it's "formatted". If valid and contains exception data, the exception data would get logged to non-volatile memory and then KAM is cleared and the application launches normally. This method allowed me to capture all relevant data for hard faults as well as fatal application exceptions.

On my ARM M0 (TLE984x), the BSL always wipes all of RAM so I can't use RAM for KAM, which sucks. I'm using some peripheral config registers that survive a warm boot for KAM.

Good luck, and if you can offer advice the stack processing I'd love to hear it.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 06 '23 at 00:32