4

I'm not trying to start a HAL holy war, I'm curious if this bit of code has some higher functionality.

When using the GPIOs on the a STM32 device you need to enable to GPIO clock using __HAL_RCC_GPIOA_CLK_ENABLE(). Digging into the code a bit more I found what exactly that function is doing shown below. From what I can tell the only useful part of that code is the SET_BIT(RCC->IOPENR, RCC_IOPENR_GPIOAEN);, the rest just seems like useless bloat.

I can see why __IO uint32_t tmpreg; and UNUSED(tmpreg); could be used, but the line tmpreg = READ_BIT(RCC->IOPENR, RCC_IOPENR_GPIOAEN); doesn't really do anything so all three seem pointless. Am I missing something?

#define __HAL_RCC_GPIOA_CLK_ENABLE()  \ 
    do { \
    __IO uint32_t tmpreg; \
    SET_BIT(RCC->IOPENR, RCC_IOPENR_GPIOAEN);\ 
    tmpreg = READ_BIT(RCC->IOPENR, RCC_IOPENR_GPIOAEN);\
    UNUSED(tmpreg); \
    } while(0)

SET_BIT

#define SET_BIT(REG, BIT)     ((REG) |= (BIT))

READ_BIT

#define READ_BIT(REG, BIT)    ((REG) & (BIT))

UNUSED

#define UNUSED(X) (void)X      /* To avoid gcc/g++ warnings */
pmacfarlane
  • 3,057
  • 1
  • 7
  • 24
Lpaulson
  • 109
  • 2
  • 12
  • I guess the function might make sense if the do loop was checking the "tmpreg" variable. – Lpaulson May 04 '23 at 22:36
  • 4
    I've seen forced "read after write" in PCI device driver interrupt handlers in the past, because the read ensures that the write has completed (and isn't still in transit on the bus) before the function ends. Sort of like a memory barrier. I'm not aware an STM32 has that problem though – pmacfarlane May 04 '23 at 22:39
  • 1
    I do know that you need to wait a few clocks after enabling a peripheral clock on an STM32 before you can access its registers. Maybe this enforces that. since the register is volatile. – pmacfarlane May 04 '23 at 22:40
  • 2
    The "read" could have a side-effect and seems to be there on purpose. The way to know is to read the manual very carefully. There are definitely cases of "bloat" in the STM32 HALs, but I am not sure this is one. The `do`-`while` part should not generate any code, but is there for a [good reason](https://stackoverflow.com/questions/154136/why-use-apparently-meaningless-do-while-and-if-else-statements-in-macros). – nielsen May 04 '23 at 22:44
  • @nielsen It _is_ bloat since it is needlessly slow for no good reason. If ST knew how many ticks the clock needs, then they could write the code accordingly. But I guess ST doesn't have that kind of detailed knowledge about STM32... either that or writing library-quality code isn't their priority. – Lundin May 05 '23 at 14:11

2 Answers2

7

This answer is completely speculation. Only STMicro could give the reason for this macro. But I think my theory is plausible, and it won't fit in a comment.

First thing to note is that STM32 micros have very flexible clock control. It's entirely possible that it could be running the CPU at 160MHz, and the peripheral bus at 1MHz.

Picking at random an STM32 I used recently, the STM32L4P5, you can look at section 6.2.18 of the reference manual where it talks about enabling peripheral clocks...

The enable bit has a synchronization mechanism to create a glitch free clock for the peripheral. After the enable bit is set, there is a 2 clock cycles delay before the clock be active

Therefore, after enabling the peripheral clock, you need to wait for 2 cycles before the clock is active. But this is not 2 cycles of the CPU clock, it is 2 cycles of the peripheral clock.

Therefore, the macro can't delay for 2 cycles by doing, say, some NOP instructions. It has to delay for cycles on the (potentially much slower) peripheral bus. This forced read (through a volatile pointer) may achieve that.

pmacfarlane
  • 3,057
  • 1
  • 7
  • 24
  • wowww, Very cool. – Lpaulson May 04 '23 at 23:23
  • 1
    `NOP` is a very bad decision if you want ad some delay. Its execution time can be zero as it will be discarded from the pipeline before getting into the execution state. So if you have instruction which executes 4 cycles and you execute from ITCRAM the micro may skip up to 20 of them (tested myself). It is used only as a **padding**. It is also quite clearly explained in the programming manual. – 0___________ May 05 '23 at 01:20
  • 1
    Here : https://developer.arm.com/documentation/dui0552/a/the-cortex-m3-instruction-set/miscellaneous-instructions/nop – 0___________ May 05 '23 at 01:25
  • @0___________ I meant a conceptual `NOP` like a pointless `ADD r0, #0`. My point was that you have to delay for peripheral clocks, not CPU clocks. – pmacfarlane May 05 '23 at 06:53
  • 4
    NOP was actually the standard way to deal with situations like this before Cortex M. Very handy way to count exact number of clock cycles without digging deep into the core manual. Optimizing out NOP on the hardware level was a mistake by ARM imo... I mean, the instruction isn't good for much else. – Lundin May 05 '23 at 14:07
3

SET_BIT(RCC->IOPENR, RCC_IOPENR_GPIOAEN);, the rest just seems like useless bloat.

No, it is not.

  1. SET_BIT(RCC->IOPENR, RCC_IOPENR_GPIOAEN); enables the clock
  2. tmpreg = READ_BIT(RCC->IOPENR, RCC_IOPENR_GPIOAEN); Operations on peripheral buses in STM32 uCs are strongly ordered so the read operation will be commenced when the write has finished. Also, this read gives enough time for the GPIO clock to settle up.
  3. UNUSED(tmpreg); is silencing the compiler warning about unused variable.

It is written quite a conservative way (not the wrong way). It is possible to avoid that volatile variable by simply UNUSED(RCC->IOPENR);. This operation will also take more than 2 cycles and will not be optimized by the compiler as IOPENR is declared as volatile

https://godbolt.org/z/MnPK9v37v

Bare in mind that readback will not work on the busses which are not strongly ordered. In this circumstance, the read can be done before the write finishes.

0___________
  • 60,014
  • 4
  • 34
  • 74
  • Don't they provide a flag which you can poll in a loop? Microchip SAM ARMs work very similar but then you just poll for a flag in a busy-wait while until the clock is up. – Lundin May 05 '23 at 14:04
  • @Lundin no, they not Readback is enough as it is very fast (two processor cycles after write) process on STM32. Atmel clock peripheral start is significantly slower and they needed to provide such flag. – 0___________ May 05 '23 at 14:19
  • Yeah but the macro posted by the OP is much longer than 2 cycles. Pasted together with your code for comparison: https://godbolt.org/z/a9oe9jcbb. – Lundin May 05 '23 at 14:26
  • @Lundin Readback is enough. I think it was written to avoid MISRA static analyzers problems. – 0___________ May 05 '23 at 14:30
  • For MISRA compliance you'd get rid of all these macros in the first place. Function-like macros are frowned upon by MISRA, meaning you need a solid rationale to motivate using one. And there exists no solid rationale for why you'd ever write a macro like `#define SET_BIT(REG, BIT) ((REG) |= (BIT))`. `REG |= BIT;` is already as readable as C code gets, so the macro is just bloat and a potential bug source. – Lundin May 08 '23 at 06:31