1

I'm trying to assemble a file that uses ARM's CRC instruction. The assembler is producing an error Error: selected processor does not support 'crc32b w1,w0,w0'.

There are runtime checks in place, so we are safe with the instruction. The technique works fine on i686 and x86_64. For example, I can assemble a file that uses Intel CRC intrinsics or SHA Intrinsics without -mcrc or -msha (and on a machine without the features).

Here is the test case:

$ cat test.cxx
#include <arm_neon.h>

#define GCC_INLINE_ATTRIB __attribute__((__gnu_inline__, __always_inline__, __artificial__))

#if defined(__GNUC__) && !defined(__ARM_FEATURE_CRC32)
__inline unsigned int GCC_INLINE_ATTRIB
CRC32B(unsigned int crc, unsigned char v)
{
  unsigned int r;
  asm ("crc32b %w2, %w1, %w0" : "=r"(r) : "r"(crc), "r"((unsigned int)v));
  return r;
}
#else
  // Use the intrinsic
# define CRC32B(a,b) __crc32b(a,b)
#endif

int main(int argc, char* argv[])
{
  return CRC32B(argc, argc);
}

And here is the result:

$ g++ test.cxx -c
/tmp/ccqHBPUf.s: Assembler messages:
/tmp/ccqHBPUf.s:23: Error: selected processor does not support `crc32b w1,w0,w0'

Placing the ASM code in a source file and compiling with different options is not feasible because CRC32B will be used in C++ header files, too.

How do I get GAS to assemble the instruction?


GCC's configuration and options are the reason we are trying to do things this way. User's don't read manuals, so they won't add -march=armv8-a+crc+crypto -mtune=cortex-a53 to CFLAGS and CXXFLAGS.

In addition, distros compile to a "least capable" machine, so we want the hardware acceleration routines available. When the library is provided by a distro like Linaro, both code paths (software CRC and hardware accelerated CRC) will be available.


The machine is a LeMaker HiKey, which is ARMv8/Aarch64. It has an A53 processor with CRC and Crypto (CRC and Crypto is optional under the architecture):

$ cat /proc/cpuinfo
Processor       : AArch64 Processor rev 3 (aarch64)
processor       : 0
...
processor       : 7
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: AArch64

GCC lacks most of the usual defines one expects to be present by default:

$ g++ -dM -E - </dev/null | sort | egrep -i '(arm|neon|aarch|asimd)'
#define __aarch64__ 1
#define __AARCH64_CMODEL_SMALL__ 1
#define __AARCH64EL__ 1

Using GCC's -march=native does not work on ARM:

$ g++ -march=native -dM -E - </dev/null | sort | egrep -i '(arm|neon|aarch|asimd)'
cc1: error: unknown value ‘native’ for -march

And Clang:

$ clang++ -dM -E - </dev/null | sort | egrep -i '(arm|neon|aarch|asimd)'
#define __AARCH64EL__ 1
#define __ARM_64BIT_STATE 1
#define __ARM_ACLE 200
#define __ARM_ALIGN_MAX_STACK_PWR 4
#define __ARM_ARCH 8
#define __ARM_ARCH_ISA_A64 1
#define __ARM_ARCH_PROFILE 'A'
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FP 0xe
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_FP_FENV_ROUNDING 1
#define __ARM_NEON 1
#define __ARM_NEON_FP 0xe
#define __ARM_PCS_AAPCS64 1
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_SIZEOF_WCHAR_T 4
#define __aarch64__ 1

GCC version:

$ gcc -v
...
gcc version 4.9.2 (Debian/Linaro 4.9.2-10)

GAS version:

$ as -v
GNU assembler version 2.24 (aarch64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.24
Community
  • 1
  • 1
jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    I don't have an environment to test this, but it sounds like what you need is `.arch_extension name`. Perhaps added directly to this asm instruction. According to the [docs](https://sourceware.org/binutils/docs/as/ARM-Directives.html), this allows you to *add or remove extensions incrementally to the architecture being compiled for*. Failing that, perhaps adding an `.arch name` as a "top level" bit of 'basic' asm? – David Wohlferd Apr 19 '17 at 19:52
  • Or are there more constraints here that I'm missing? Adding directives to asm instructions like this is nothing new. People have been using it to put intel-style assembler into asm instructions since forever. – David Wohlferd Apr 20 '17 at 09:12
  • @David - Thanks. I thought along the same lines. Alas, A-32, Aarch32 and Aarch64 are not IA32. I tried `.arch_extension` yesterday, but it resulted in errors. `.arch_extension` needs Binutils 2.26 from 2016. 2.26 has the support for both Aarch32 and Aarch64. Also see [Error: unknown pseudo-op: `.arch_extension'](https://lists.linaro.org/pipermail/linaro-toolchain/2017-April/006112.html) on the Linaro Toolchain mailing list. – jww Apr 20 '17 at 09:22
  • Sounds like `#pragma GCC target` is going to be a better bet than using (a potentially conflicting) `.arch` asm at file scope. Something to keep in mind though. – David Wohlferd Apr 20 '17 at 10:06
  • @David - Thanks. `#pragma GCC target` needs GCC 6 for Aarch64. And a correction: Binutils 2.26 is from 2014; not 2016. – jww Apr 20 '17 at 10:18

1 Answers1

2

This answer came from Jiong Wang on the Binutils mailing list. It bypasses GAS's architectural requirements and plays well with GCC:

__inline unsigned int GCC_INLINE_ATTRIB
CRC32W(unsigned int crc, unsigned int val)
{
#if 1
    volatile unsigned int res;
    asm ("\n"
         "\t" ".set reg_x0, 0\n"
         "\t" ".set reg_x1, 1\n"
         "\t" ".set reg_x2, 2\n"
         "\t" ".set reg_x3, 3\n"
         "\t" ".set reg_x4, 4\n"
         "\t" ".set reg_x5, 5\n"
         "\t" ".set reg_x6, 6\n"
         "\t" ".set reg_x7, 7\n"
         "\t" "#crc32w %w0, %w1, %w2\n"
         "\t" ".inst 0x1ac04800 | (reg_%2 << 16) | (reg_%1 << 5) | (reg_%0)\n"
         : "=r"(res) : "r"(crc), "r"(val)
    );
    return res;
#else
    volatile unsigned int res;
    asm (".cpu generic+fp+simd+crc+crypto  \n"
         "crc32w %w0, %w1, %w2             \n"
         : "=r"(res) : "r"(crc), "r"(val));
    return res;
#endif
}

The second one commented out by the preprocessor block was suggested by Nick Clifton on the Binutils mailing list. The idea is GCC generates code using the ISA based on -march=XXX, so it does not matter if we increase capabilities to get past the assembler. We decided to go with Wang's answer because we did not want potential side effects from modifying the .cpu.

And the verification with GCC 4.8 and Binutils 2.24:

$ g++ -O1 test.cxx -c

$ objdump --disassemble test.o

test.o:     file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <main>:
   0:   12001c01        and     w1, w0, #0xff
   4:   1ac14800        crc32w  w0, w0, w1
   8:   d65f03c0        ret
jww
  • 97,681
  • 90
  • 411
  • 885