2

I'm using a STM32F429 with ARM Cortex-M4 processor. I premise that I don't know the assembly of ARM, but I need to optimize the code. I read the solution of

How to measure program execution time in ARM Cortex-A8 processor?

that is that I need, but that solution is for Cortex-A8. For a whim, I tried to implement the code of link above on my code but I obtain a SEGV in this point:

if (enable_divider)
    value |= 8;     // enable "by 64" divider for CCNT.

  value |= 16;

  // program the performance-counter control-register:
  asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));  /*<---Here I have SEGV error*/

  // enable all counters:  
  asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));  

  // clear overflows:
  asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));

How can I adjust this assembly code to perform on ARM Cortex-M4?

Community
  • 1
  • 1
Develobeer
  • 425
  • 1
  • 8
  • 19
  • Hi Chris. I read the answer on link. In the other link given by Throwback1986, he suggests to use DWT_CYCCNT. Ok. But, seeing the solution, I don't understand how to implement on my code (using the code of my question). Excuse me, I'm newbie on assembly of ARM processor :( – Develobeer Nov 23 '14 at 17:04
  • Most `MCR p15` instructions assume you are not in **user mode**. Also, if this is for your CPU you are fine; Most `MCR p15` instructions must be specific to your CPU (Cortex-M or Cortex-M3). – artless noise Nov 23 '14 at 18:26
  • The SEGV (I assume that's a HardFault) is probably due to some hardware being accessed without having the clock-power turned on. As you're new to ARM, you should know that the microcontroller starts up in low-power mode. That means you'll have to turn on peripherals yourself. If you try accessing a peripheral without turning it on first, you'll get an exception (eg. a crash). See the manuals I've mentioned in my comment under my answer below. –  Nov 24 '14 at 00:22

1 Answers1

1

Ditch the Cortex-A8 method.

This is the correct way to do it for most Cortex-M based microcontrollers (do not use SysTick!):

  1. Set up a timer, which runs at the same speed as the CPU.
  2. Do not attach an interrupt to the timer.
  3. Poll the timer value by using a single LDR instruction before you start your measuring.
  4. Execute a NOP instruction, then run the code you want to measure.
  5. Execute a NOP instruction, then poll the timer value by using a single LDR instruction when you end your measuring.

The NOP instructions are for accuracy, in order to make sure the pipelining does not disturb your results. This is necessary on the Cortex-M3, because one LDR instruction takes two clock cycles. Two contiguous LDR instructions can be pipelined, so they take only 3 clock cycles total. See the Cortex-M4 Technical Reference Manual at the ARM Information Center, for more information on the instruction set timing.

Of course, you should run your code from internal SRAM, in order to make sure it's not slowed down by the slow Flash memory.

I cannot guarantee that this will be 100% cycle-accurate on all devices, but it should get very close. (See Chris' comment below). You should also know that this is intended to be used in an environment with no interrupts.

Venemo
  • 18,515
  • 13
  • 84
  • 125
  • Hi PacMan! I theorically understood your answer but I'm newbie on assembly of ARM processor. Can you post me an example code for this purpose? – Develobeer Nov 23 '14 at 16:56
  • Unfortunately I can't post example code for the STM, as I will not be able to verify it. (STM is a fine product, but I'm working on other Cortex-M3 MCUs). I can say, though, that you'll need to turn on clock-power for the timer, *before* you start configuring the timer, otherwise you'll get a HardFault crash. You can set up the timer using a C function (no problem there), and if you do not need a cycle-accurate result, you can also use a C-function to read the Timer Counter register. –  Nov 23 '14 at 17:01
  • (For those who use STM-devices, feel free to implement the above and post the code as an answer) –  Nov 23 '14 at 17:01
  • Access to a timer on the peripheral bus may not have fixed cost if other parts of the chip are active and able to trigger the bus arbiter. – Chris Stratton Nov 23 '14 at 17:18
  • @Chris Stratton: I believe you're right. Unfortunately, there's no guarantee that a microcontroller has a timer either (but most ARM fortunately Cortex-M does). Since the STM has a stop-watch feature, I'll need to change my answer. ;) –  Nov 23 '14 at 23:43
  • @Anth I changed 'Cortex-M3' in your question and my answer to 'Cortex-M4', as the STM32F429 is a Cortex-M4 microcontroller. Below is a few useful links, including the manuals from ST. –  Nov 24 '14 at 00:14
  • PDF-versions of the Cortex-M4 references from ARM (eg. the [User Guide](http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf) + the [Technical Reference Manual](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439d/DDI0439D_cortex_m4_processor_r0p1_trm.pdf)). [UM1653](http://www.st.com/web/en/resource/technical/document/user_manual/DM00091013.pdf), [RM0090](http://www.st.com/web/en/resource/technical/document/reference_manual/DM00031020.pdf), [PM0214](http://www.st.com/web/en/resource/technical/document/programming_manual/DM00046982.pdf). –  Nov 24 '14 at 00:18
  • You're right @PacMan. I don't know why I write Cortex-M3. I needed of M4 xD Thank you for editing – Develobeer Nov 24 '14 at 00:34
  • No problem. -Look at it this way: You just got a much better microcontroller free of charge. ;) -The Cortex-M4 does everything that the Cortex-M3 does, but in addition, it does it faster, and in addition it has extra instructions. I forgot to mention the [ARM Connected Community](http://community.arm.com/) above. –  Nov 24 '14 at 02:28