3
int main(){
__asm volatile
{
    // load data
    vld1.16 {q0, q1}, [r0]!

...

using command

armcc --cpu=Cortex-A9 -O2 -Otime --vectorize --remarks -g --md --depend_format=unix_escaped --no_depend_system_headers -c -o test.o test.c

It has error shows that

"test.c", line 7: Error:  #20: identifier "q0" is undefined
      vld1.16 {q0, q1}, [r0]!
               ^
"test.c", line 8: Error:  #20: identifier "q2" is undefined
          vld1.16 {q2, q3}, [r0]!
                   ^

Did I miss any flags in the armcc command?

The armcc version is

Product: ARM Compiler 5.05
Component: ARM Compiler 5.05 (build 41)
Tool: armcc [4d0eb9]
For support see http://www.arm.com/support/
Software supplied by: ARM Limited
Tmx
  • 579
  • 2
  • 6
  • 17
  • Just FYI, intrinsics for manual vectorization are usually a better choice than inline asm, since compilers do a good job with them these days. (At least gcc and clang do; no idea about armcc.) See https://gcc.gnu.org/wiki/DontUseInlineAsm for reasons why, including that it can actually lead to slower code by preventing the compiler from propagating constants, or other optimizations. – Peter Cordes Nov 04 '16 at 02:26

1 Answers1

3

Although I don't use armcc, I don't believe your compiler supports inline assembly for NEON.

https://static.docs.arm.com/dui0472/k/DUI0472K_armcc_user_guide.pdf

Take a look at section 7.3, which states:

7.3 Restrictions on inline assembler support in the compiler

The inline assembler in the compiler does not support a number of instructions. Specifically, the inline assembler does not support:

• Thumb assembly language in processors without Thumb-2 technology. • VFP instructions that were added in VFPv3 or higher. • NEON instructions. • The ARMv6 SETEND instruction and some of the system extensions. • ARMv5 BX , BLX , and BXJ instructions.

The reason it is probably almost working is that vld is part of VFPv2, which is supported, and it's not until it gets to the "q" that it gets confused.

If you were using gcc/clang variants, then yes, I'd suggest you would need to implicitly compile targeting NEON with -march=armv7-a -mfpu=neon, specifying both the base ISA and the floating point unit extensions, but only to use compiler intrinsics, not inline assembly. (As mentioned in the comment).

Peter M
  • 1,918
  • 16
  • 24
  • 1
    With gcc, inline-asm is passed through from the compiler to the assembler unchanged (just with `%[name]` operands substituted, and so), so `-march` options only affect inline asm if they result in some kind of directive the the compiler's asm output. IDK about ARM, but for x86, `-msse4` or whatever just tells the compiler it can use those instructions itself, and isn't required for you to use them from inline asm. (It is required for using the relevant intrinsics) – Peter Cordes Nov 04 '16 at 02:23
  • Thanks, yes, I'm constantly amazed at how basic/stupid gcc's assembler is. IIR, ARM's tools were much better, and probably pass architecture flags to the assembler to help check ISA compatibility (although it has been 8 years since I've used them myself). – Peter M Nov 04 '16 at 22:14
  • If you're hand-writing asm, you can use NASM or YASM macros to check ISA compatibility, like the x264 project does. I haven't looked into how they do it (whether it's macros or whether assembler support is needed), or if it can be adapted to gas. Anyway, the purpose of `-march` is to tell the compiler what *it* can emit, not to limit what you can do with inline asm. If you avoid `-march=` or `-msse4`, you can manually check CPUID before using SSE4 instructions. (Of course, the better way to do that is separate compilation units, but the gcc/gas design makes sense this way.) – Peter Cordes Nov 05 '16 at 22:10