0

According to ARMv7-M and ARMv8-M reference manuals, exception stack frame is formed on currently active stack (MSP or PSP, depending on what was interrupted by the exception).

This decision looks unlogical to me: every process stack has to have a space for exception stack frame; it could be huge, especially when FPU and security extensions are used. But more importantly, it leaves at least one unanswered question: how to isolate process stack overflows from the rest of a system?

Suppose you have ARMv8-M platform (i.e. Cortex-M33) that runs unprivileged process with MPU restrictions enforced. Process has just a single MPU region for stack, and also PSPLIM register is set. Process runs near it's stack limit and the stack space is insufficient to hold exception frame.

Now some peripheral interrupt arrives. Most likely you will get an UsageFault with STKOF flag set. This is where problems start. First, you missed the exception. Most likely it is still pending and you will get it again. But how to recover?

UsageFault handling will be subject to same stack limits. There is still no space for exception frame. HardFault can ignore stack limits, but this does not make situation any better. Ignored stack limit means that memory beyond the stack is now corrupted. You could probably reserve some space after PSPLIM exactly for the HardFault, and at least you won't get corrupted memory.

Is there a safe way to deal with such situation? System should remain consistent and operational regardless of bugs (or malicious behavior) of unprivileged process.

artless noise
  • 21,212
  • 6
  • 68
  • 105
Maxim
  • 1,209
  • 15
  • 28
  • Maxim - it i snot a big processor only a very simple core. Two stacks are basically needed to implement RTOS-es and you should not expect protection mechanisms known from the big processors. How to recover? - you do not recover from it. Normally those exceptions are used to set peripherals (for example to stop engines) and restart. – 0___________ Jun 20 '23 at 21:37
  • The whole purpose of MPUs, privilege isolation, stack limit registers and all that stuff is to make system safe and isolate failures. ATmega is simple core. Cortex-M33 is anything but a *simple* core - it has extensive process isolation facilities. Specific MCUs like STM32U5 extend such facilities even to peripherals like DMA controllers. The question is how to implement such isolation correctly. – Maxim Jun 20 '23 at 21:44
  • It is very silmlified and it was never intended to work as on the big processors – 0___________ Jun 21 '23 at 14:58
  • You can use lazy saving. Write the ISR so that it checks which stack is in use and if the PSP, switch to MSP before floating point, etc. operations are performed. If your MSP is blowing up in the ISR, you need more resources. Of course, you need to restore the stack before return. – artless noise Jun 21 '23 at 20:58
  • 1
    Another way would be to reserve memory beyond the PSP that is only accessible in priveledge modes (exceptions). The excess reserve should be enough to accommodate whatever exception model. I now support the close reason as it seems any suggests you get, you are not willing to compromise on whatever model you have. For instance, now you may say, I can not reserve stack, etc. – artless noise Jun 22 '23 at 12:12
  • I can reserve stack, spend a bit more resources or do something like that - this is what I suggested initially. The only requirement is functional correctness - it should not be possible to crash the system from unprivileged code. MPU-protected stack areas should work, yes. – Maxim Jun 22 '23 at 13:51
  • Well, I thought further about that. If the 'user' space can update the PSP directly by writing anything, then they can force it to any memory and wait for an interrupt. As the MPU is physical based, the value written is accessible and the saved registers can be written at an arbitrary point (if everything I understand is correct). So, it does seem like an unresolvable security issue. For safety, where people generally conform to only pushing/popping values, the extended memory might work. The Cortex-M model of automatically saving registers on exceptions is the root cause. – artless noise Jun 23 '23 at 12:14
  • [This article](https://blog.stratifylabs.dev/device/2013-10-09-Context-Switching-on-the-Cortex-M3/) seems to contradict your assertion 'exception stack frame is formed on currently active stack'. Only the MSP being used make sense, in that a process stack can be switched. Having a separate stack per process is a standard structure. So if makes sense for a pre-emptive timer to switch tasks by updating the stack in use. – artless noise Jun 23 '23 at 12:34
  • I haven't written Cortex-M scheduling, just TrustZone and plain Cortex-A. I would think there is either an architectural difference between some Cortex-M (which stack interrupts use) or some option to control this. I see people saying that either stack can be used by interrupts; but this is not secure to arbitrary code. For many Cortex-M, it is fixed functionality, so all code on the system should be trusted. For an IoT type device, this may not be the case. – artless noise Jun 23 '23 at 12:45

2 Answers2

0

The configuration you suggest, unprivileged code with the MPU active and running on the Main stack, requires careful allocation of stack space. The main stack must have enough space to support exceptions from the fixed priority exceptions (NMI, HardFault) and any other nesting of system exceptions and interrupts. Depending upon how system exception and interrupt priorities are assigned, this can add up to substantial space.

The situation is more predictable if Handler Mode processing is placed on the Main stack and the unprivileged, Thread mode code uses the Process stack. For that case only one level of exception stack frame is needed because once an exception happens, say an interrupt, any other exceptions of higher priority use the Main stack. This configuration is easier to understand and setup the stack usage.

I usually assign all the system exceptions the same priority, which is higher than interrupts, which are higher than either SVC or PendSV. Then the Main stack must have space for 3 exception frames plus however many levels of nested interrupts (I usual only use 1, so no interrupt nesting) plus the stack usage by the handlers (which the compiler will estimate). That leaves the Process stack to run the unprivileged code (again the compiler will help) plus one exception frame.

I'm not sure what form of recovery from system exceptions you require, but I treat them all as unrecoverable and just do the best I can to save state that can be examined after a reset.

andy mango
  • 1,526
  • 1
  • 8
  • 13
  • My configuration was implying that unprivileged code is running on it's own stack with PSP. Primary issue here is that RTOS must be able to recover from bugs/malicity in unprivileged code, when stack is exhausted and insuffieient to hold anything, even a single frame. – Maxim Jun 22 '23 at 10:37
  • If the unprivileged code takes an exception and that causes its stack limits to be exceeded, then the resulting UsageFault exception will push its frame on the Main stack (assuming the UsageFault exception is a higher priority than the exception that caused the error) and you will have allocated sufficient Main stack space to deal with it. But it's still not clear what recovery you can expect. An RTOS might "disable" the task or restart it, but it would seem unlikely to me that the system would still function properly in that case. – andy mango Jun 22 '23 at 16:22
  • Does PSP->MSP stack switch occur if stack limit is violated on initial (hardware) context stacking of the first arriving exception? – Maxim Jun 22 '23 at 18:32
  • It is my understanding that the switch from Thread mode to Handler mode is made when the exception becomes active. I cannot quote you chapter and verse from some ARM documentation (despite the hours I've spent reading them), but I'm sure you'll find it somewhere. Otherwise, the core would be "trapped" as you suggest and end up in HardFault or lock up whenever a stack limit was exceeded for unprivileged code. The ARM guys would have thought this situation through. – andy mango Jun 22 '23 at 20:07
0

TL;DR

Stack frame is not written. You lose the context of currently executing task. Inaccessible memory is not corrupted. UsageFault (for stack limit) or MemManage (for MPU violations) is taken instead of original exception. This behavior is well-documented in ARM reference manual. Invalid stack frame is signalled with MMFSR.MSTKERR or UFSR.STKOF bits depending on the exception.

Test program

// Configuration defines:
// #define TESTCASE 0                // 0, 1 or 2
// #define ENABLE_STACK_LIMIT        // Enables SPLIM registers
// #define ENABLE_MPU                // Enables MPU in unprivileged mode

#include <stm32u5xx.h>
#include <cstring>

enum {
    MSP = 0x20001000,   // Main stack pointer
    MSS = 32,           // Main stack size
    PSP = 0x20000F80,   // Process stack pointer
    PSS = 32,           // Process stack size
    XSO = 0x800,        // Offset of stack area from MSP
    XSS = 0x1000,       // Total size of stack area

    FLASH_START     = 0x08000000,
    FLASH_END       = 0x08010000
};

#define EXCEPTION_STUB(func)                                                                            \
    extern "C" [[gnu::naked]] void func() {                                                             \
        __asm volatile (                                                                                \
            "ldr r0, =$0xDDCCBBAA\n"                                                                    \
            "push {r0}\n"               /* Push marker value to stack to see it in the debugger */      \
            "add sp, 4\n"               /* Restore stack pointer after push */                          \
            "bkpt\n"                                                                                    \
            "bx lr\n"                                                                                   \
            ::: "r0", "memory"                                                                          \
        );                                                                                              \
    }

EXCEPTION_STUB(HardFault_Handler)
EXCEPTION_STUB(BusFault_Handler)
EXCEPTION_STUB(MemManage_Handler)
EXCEPTION_STUB(UsageFault_Handler)
EXCEPTION_STUB(SVC_Handler)

int main() {
    memset((void *) (MSP - XSS), 0x00, XSS + XSO);
    memset((void *) (MSP - MSS), 0x55, MSS);
    memset((void *) (PSP - PSS), 0xAA, PSS);

    SCB->SHCSR = SCB_SHCSR_USGFAULTENA_Msk | SCB_SHCSR_MEMFAULTENA_Msk | SCB_SHCSR_BUSFAULTENA_Msk;

#if defined(ENABLE_MPU)
    /* Regions must be 32-byte aligned to meet MPU requirements */
    static_assert(((PSP - PSS) & 0x1F) == 0);
    static_assert((PSP & 0x1F) == 0);
    static_assert((FLASH_START & 0x1F) == 0);
    static_assert((FLASH_END & 0x1F) == 0);

    /* Region 0: stack, RW, execute-never */
    MPU->RNR = 0;
    MPU->RBAR = (PSP - PSS) | (0b10 << MPU_RBAR_SH_Pos) | (0b01 << MPU_RBAR_AP_Pos) | MPU_RBAR_XN_Msk;
    MPU->RLAR = ((PSP - 1) & MPU_RLAR_LIMIT_Msk) | MPU_RLAR_EN_Msk;

    /* Region 1: flash, RO, executable */
    MPU->RNR = 1;
    MPU->RBAR = FLASH_START | (0b10 << MPU_RBAR_SH_Pos) | (0b11 << MPU_RBAR_AP_Pos);
    MPU->RLAR = ((FLASH_END - 1) & MPU_RLAR_LIMIT_Msk) | MPU_RLAR_EN_Msk;

    MPU->MAIR0 = 0b01000100;    // Normal memory, non-cacheable
    MPU->CTRL = MPU_CTRL_ENABLE_Msk | MPU_CTRL_PRIVDEFENA_Msk;
#endif

    __set_MSP(MSP);
    __set_PSP(PSP);

#if defined(ENABLE_STACK_LIMIT)
    __set_MSPLIM(MSP - MSS);
    __set_PSPLIM(PSP - PSS);
#endif

    __set_CONTROL(__get_CONTROL() | CONTROL_SPSEL_Msk | CONTROL_nPRIV_Msk);
    __ISB();

#if TESTCASE == 0
    /* Stack pointer stays valid in this test case */
    /* Decrement it so stack frame (32 bytes) won't fit */
    __asm volatile ("sub sp, 4");
#elif TESTCASE == 1
    /* Stack pointer is manually adjusted to cause stack overflow */
    __asm volatile (
        "ldr r0, =$0x20000F00\n"
        "mov sp, r0\n"
        "isb\n"
        ::: "r0", "memory"
    );
#elif TESTCASE == 2
    /* Stack pointer is corrupted upwards and placed above the original stack */
    __asm volatile (
        "ldr r0, =$0x20000FA0\n"
        "mov sp, r0\n"
        "isb\n"
        ::: "r0", "memory"
    );
#endif

    __asm volatile (
        "ldr r0, =$0x44332211\n"    /* Put markers in the registers to make stack frame more visible in memory view */
        "ldr r1, =$0x88776655\n"
        "bkpt\n"                    /* Last chance to inspect state of the core */
        "svc 123\n"                 /* Trigger exception */
        "bkpt\n"                    /* Halt again if SVC has returned */
        ::: "r0", "memory"
    );

    return 0;
}

Implemented test cases:

  1. Simple stack overflow: SPLIM is sufficient to catch this
  2. SP is adjusted below current stack: SPLIM is sufficient to catch this. Exception is raised when SP is written (this is documented behavior too), memory access is not required.
  3. SP is adjusted above current stack. MPU is required to catch this.

SPLIM is mostly redundant when MPU is active, but it may be useful when another MPU region is directly adjacent to stack region and MemManage is not generated.

Both thread ("regular") stack overflow and context stacking failure set UFSR.STKOF. From handler point of view, exact stack overflow reason is not important: task context is lost anyway.

References

Observed behavior is documented in the following parts of ARMv8 architecture reference manual:

  1. B3.18 Exception handling

    RWBND: Preemption of current execution causes the following basic sequence:

    • R0-R3, R12, LR, RETPSR, including CONTROL.SFPA, are stacked.
    • The return address is determined and stacked.
    • <...>
    • The exception to be taken is chosen, and IPSR.Exception is set accordingly. The setting of IPSR.Exception to a nonzero value causes the PE to change to Handler mode.

    This implies that context stacking happens while PE is still in Thread mode with all security restrictions still active.

  2. B3.19 Exception entry, context stacking

    RVNSK: If one or more of the following exceptions is generated during the stacking operations on exception entry the PE is permitted to abandon any remaining stacking operations:

    • MemManage fault
    • STKOF UsageFault

    IFKBH: If a MemManage fault, BusFault, or AUVIOL SecureFault occurs on a stacking memory access during exception entry, then stacking of Additional state context is optional.

  3. B3.21 Stack limit checks

    RZLZG: On a violation of a stack limit during either exception entry or tail-chaining:

    • In a PE with the Main Extension, a synchronous STKOF UsageFault is generated. Otherwise, a HardFault is generated.
    • The stack pointer is set to the stack limit value.
    • Push operations to addresses below the stack limit value are not performed.

    IBJHX: When an instruction updates the stack pointer, if it results in a violation of the stack limit, it is the modification of the stack pointer that generates the exception, rather than an access that uses the out-of-range stack pointer.

  4. B3.24 Exceptions during exception entry

    ILBGQ: During exception entry exceptions can occur <...>, for example a MemManage fault on the push to the stack.
    <...>
    When the exception entry sequence itself causes an exception, the latter exception is a derived exception.

    RMRTR: For Derived exceptions, late-arrival preemption is mandatory.

Maxim
  • 1,209
  • 15
  • 28