3

I checked core cycle count using DWT->CYCCNT. but different from my prediction. Could you tell me the cause?

My device is STM32 NUCLEO-L476RG. I just check DWT->CYCCNT. and only changed the number of times an integer assign.

  m_nStart = DWT->CYCCNT;  
  m_nStop = DWT->CYCCNT;
  printf("Cycle diff - assign 0 : %lu\n", m_nStop - m_nStart);

  m_nStart = DWT->CYCCNT;  
  i = 10;
  m_nStop = DWT->CYCCNT;
  printf("Cycle diff - assign 1 : %lu\n", m_nStop - m_nStart); 

  m_nStart = DWT->CYCCNT;  
  i = 10;
  i = 20;
  m_nStop = DWT->CYCCNT;
  printf("Cycle diff - assign 2 : %lu\n", m_nStop - m_nStart); 

  m_nStart = DWT->CYCCNT;  
  i = 10;
  i = 20;
  i = 30;
  m_nStop = DWT->CYCCNT;
  printf("Cycle diff - assign 3 : %lu\n", m_nStop - m_nStart); 

  m_nStart = DWT->CYCCNT;  
  i = 10;
  i = 20;
  i = 30;
  i = 40;
  m_nStop = DWT->CYCCNT;
  printf("Cycle diff - assign 4 : %lu\n", m_nStop - m_nStart);

I expected to be proportional to the number of assignments. but result is this.

Cycle diff - assign 0 : 14

Cycle diff - assign 1 : 16

Cycle diff - assign 2 : 18

Cycle diff - assign 3 : 20

Cycle diff - assign 4 : 22

Why result like that?

larein
  • 33
  • 2
  • 2
    `to be proportional to the number of assignments` - your compiler is way smarter, it optimizes it. All the `i = ` assigments were optimized out and this all is a no-op. How is `i` declared? What compiler and compiler options are you using? Did you inspect the generated assembly from your compiler to confirm that compiler really generated assembly instructions to do the assignments operations? – KamilCuk Aug 20 '19 at 11:32
  • 2
    Once you account for some overhead, that is roughly proportional. Looks like `14 + (2 * assignments)`. – Thomas Jager Aug 20 '19 at 11:36
  • @Kamil [i] is local variable, compiler is arm-none-eabi-gcc, and optimize option is not using. and assembly is not checking. I don't know check assembly. Is there a good document to reference how to view an assembly? – larein Aug 20 '19 at 11:48
  • Not really, it's all compiler dependent. But a short search about gcc leads to [this stackoverflow question](https://stackoverflow.com/questions/1354899/how-can-i-see-the-assembly-code-that-is-generated-by-a-gcc-any-flavor-compiler). `i` is a local variable makes is worse - it's easier to optimize, but I asked how is it "declared" - is it `volatile`? If not, try with `i` being `volatile`... – KamilCuk Aug 20 '19 at 11:49
  • @ Kamil I declare local variable "int i=0" to "volatile int i=0;". but it same result. and change global variable. Result is more irregular. 8, 17, 22, 27, 35 – larein Aug 20 '19 at 11:58
  • 3
    `8, 17, 22, 27, 35` - looks like a valid result. Looks like overhead for reading DWT is 8 cycles and each assignment takes ~4 cycles and compiler optimizes it a lot. You don't know on which cycle the DWT will read, so it can return any value between 0-4. Ideal results probably look like `8, 16, 24, 28, 32` looks almost like it. – KamilCuk Aug 20 '19 at 12:42
  • @KamilCuk That's almost exactly what you'd expect from this. – Thomas Jager Aug 20 '19 at 12:43
  • 1
    Use `arm-none-eabi-objdump -dS myprogram.elf >myprogram.asm` to disassemble the ELF binary, and include the C source inline with the assembly. Very very much recommended when working at this level. – unwind Aug 20 '19 at 13:11

1 Answers1

5

It is difficult to predict the number of cycles needed to execute one line of C code on a ARM Cortex. it depends on the compiler, on the level of optimisation you set up, of the way you declared the variables, whether the cache is enabled, where the code is executing from (RAM or Flash), etc...

You can see here the assembly it may give.

Every assignement consists of one mov and one str so two assembly instructions. But even knowing the assembly instructions that are executed does not always allow to deduce a precise number of cycles because of pipelining, caching policy, etc...

In the end, the only way to get significant figures, is to measure a portion of code, as you did.

However the code you are measuring here may not make a lot of sense (assigning several values, without doing anything in between -except perhaps if i is a register).

Guillaume Petitjean
  • 2,408
  • 1
  • 21
  • 47