gcc efficient byte copy for ARM Cortex-M4

Question

Is there a built-in gcc memcopy function that is specifically optimized to the architecture of the ARM Cortex-M4?

the processor core is only half the performance problem, the chip implementation is the other half and the libraries are optimized for the processor side. so it is possible to outperform their copy not just as cooperised mentioned but also due to the nature of the chip design. — old_timer, Feb 28 '19 at 15:33
also there are compile time options for these arm cores that are not necessarily reflected in cpuid registers so that knowledge can also affect performance for a specific implementation. But a simple look at a C library and you would have seen that they already have architecture specific memcpy's, no need to ask the question here. — old_timer, Feb 28 '19 at 15:34
"There are other ways to find out this information so you shouldn't be asking a question here" is weird logic IMO. Asking reasonable, on-topic questions (which this is) and getting answers from other people is what this site is about. — cooperised, Feb 28 '19 at 16:38

score 7 · Accepted Answer · answered Feb 03 '19 at 11:45

Yes - memcpy. Compilers and standard libraries generally have well-optimised versions of memcpy for each target platform. That's not to say that you can't beat the speed of memcpy in specific situations with knowledge of the nature of the data and its alignment, but in general you should trust the writers of the standard library to have done a good job. See this question and its answers.

score 1 · Answer 2 · answered Feb 28 '19 at 15:14

For large blocks* it is worth looking at DMA options, widely available for Cortex-M4 microcontrollers range. It is efficient in a way that during the process, the CPU will be free.

Unfortunately, the Arm Embedded GCC compiler do not have native support for DMA, it will rely on your semiconductor supplier's code.

*As setting-up DMA controller takes some time, it might not be efficient for small blocks.

gcc efficient byte copy for ARM Cortex-M4

2 Answers2