Questions tagged [sve]

For questions related to the optional 'Scalable Vector Extension' (SVE) for ARMv8-A at either the system or application level.

The Scalable Vector Extension or SVE was added to the ARMv8-A architecture in 2018. It is not mandatory to ARMv8 platform, but if it is included, the full base instruction set should be supported.

From the ARM ARM Supplement on SVE,

About the Scalable Vector Extension
The Scalable Vector Extension (SVE) is an optional extension to the ARMv8-A architecture, with a base requirement of ARMv8.2-A. SVE complements and does not replace AArch64 Advanced SIMD and floating-point functionality. If SVE is implemented, all SVE instructions are mandatory and the ARMv8.2-FP16 half-precision floating-point and the ARMv8.3-CompNum complex number instructions must be implemented.

The technology has some implications for system level programming to save and restore the context. However, its main use is as a SIMD infrastructure at the application level. Both topics are fine with this tag.

See also:

16 questions
6
votes
2 answers

How portable are the new ARM SVE instructions?

I am looking for information about the new Scalable Vector Unit (SVE) from Arm. It looks amazingly good to me for doing Image processing with beeing able to compute 2048 bit in parallel and so on. But I'm not sure if it will be running on every…
3
votes
1 answer

How can I generate SVE vectors with LLVM

clang version 11.0.0 example.c: #define ARRAYSIZE 1024 int a[ARRAYSIZE]; int b[ARRAYSIZE]; int c[ARRAYSIZE]; void subtract_arrays(int *restrict a, int *restrict b, int *restrict c) { for (int i = 0; i < ARRAYSIZE; i++) { a[i] = b[i]…
YGG
  • 31
  • 1
3
votes
2 answers

ARM V-8 with Scalable Vector Extension (SVE)

I come across this point that ARMv8 is now supporting variable length vector register from 128 bits to 2048 bits (scalable vector extension SVE). It is always good to have bigger width of register to achieve the data level parallelism. But on what…
user3476225
  • 240
  • 5
  • 14
2
votes
1 answer

ARM-SVE: wrapping runtime sized register

In a generic SIMD library eve we were looking into supporting length agnostic sve However, we cannot wrap a sizeless register into a struct to do some meta-programming around it. struct foo { svint8_t a; }; Is there a way to do it? Either clang…
Denis Yaroshevskiy
  • 1,218
  • 11
  • 24
2
votes
1 answer

What are the int8 matrix multiply instructions in Neoverse V1?

This WikiChip article states that Neoverse V1 has int8 instructions that allow 256 operations per CPU clock (per core, presumably): I'm trying to understand what these instructions are. Do they take int8 input and accumulate the results in int8's…
MWB
  • 11,740
  • 6
  • 46
  • 91
1
vote
0 answers

How can a SVE program's thread being set with `TIF_SVE` flag in the linux kernel?

I read the source code of linux aarch64: https://elixir.bootlin.com/linux/latest/source/arch/arm64/kernel/fpsimd.c#L311 It says that TIF_SVE can control the SVE register state save/restore machanism. My userspace program is using SVE instructions…
aisv
  • 21
  • 4
1
vote
1 answer

ARM SVE: svld1(mask, ptr) vs svldff1(svptrue<>, ptr)

In ARM SVE there are masked load instructions svld1and there are also non-failing loads svldff1(svptrue<>). Questions: Does it make sense to do svld1 with a mask as opppose to svldff1? The behaviour of mask in svldff1 seems confusing. Is there a…
Denis Yaroshevskiy
  • 1,218
  • 11
  • 24
1
vote
0 answers

Convert column major matrix to row major matrix

I have a column major matrix and I want to convert it to a row major matrix. Using Arm SVE instruction. I know "gathering and scattering" instructions but are not good enough for my case. Does anyone have an idea?
shb8086
  • 31
  • 5
1
vote
1 answer

why the maximum register length of SVE is 2048 bits?

I was looking at ARM's SVE recently, and I was wondering why the maximum register length in SVE is 2048 bits, and what is the problem if it is larger than this value?
zbc2468
  • 11
  • 2
1
vote
1 answer

AArch64 SVE/2 - Left pack elements from list

I'm trying to implement a SIMD algorithm with AArch64 SVE (or SVE2) that takes a list of elements and only selects the ones that meet a certain condition. It's often called Left Packing (SSE/AVX/AVX-512), or Stream Compaction (CUDA)? Is it possible…
him
  • 99
  • 1
  • 6
1
vote
2 answers

SVE / SVE2 support in GNU toolchain

I want to write an SVE/SVE2 code (assembly and/or C intrinsic) code. Which version of GNU supports SVE / SVE2? I am also interested in auto-vectorization if that is supported.
Oak Bytes
  • 4,649
  • 4
  • 36
  • 53
1
vote
1 answer

How to assemble ARM SVE instructions with GNU GAS or LLVM and run it on QEMU?

I want to play with the new ARM SVE instructions using open source tools. As a start, I would like to assemble the minimal example present at: https://developer.arm.com/docs/dui0965/latest/getting-started-with-the-sve-compiler/assembling-sve-code //…
Ciro Santilli
  • 3,693
  • 1
  • 18
  • 44
0
votes
0 answers

Enable SVE instructions

I am in EL1. The EL3 Code is supposed to have activated sve using CPTR_EL3 for all levels. I am trying to do a simple FMOV to sve registers. It crashes. What else should I check/set for SVE instruction to be available? This is compiled using…
user2346536
  • 1,464
  • 2
  • 21
  • 43
0
votes
1 answer

Software optimization guide for AArch64 Neon and SVE

There is ARM software optimization guide (e.g., https://developer.arm.com/documentation/swog309707/latest for neoverse n1). This guide doesn't seem to contain the latency and throughput for Neon or SVE. Is there a separate guide for NEON or SVE…
minglotus
  • 83
  • 6
0
votes
1 answer

In ARMV8, what is the assembly instruction "ptrue p0.b vl64" effect?

In addition, I read these instructions: ptrue p0.s ptrue p0.d ptrue p0.b vl64 ptrue p0.b vl32 So, what are their effects and differences?
wxmwy
  • 13
  • 1
1
2