Data type compatibility with NEON intrinsics

Question

I am working on ARM optimizations using the NEON intrinsics, from C++ code. I understand and master most of the typing issues, but I am stuck on this one:

The instruction vzip_u8 returns a uint8x8x2_t value (in fact an array of two uint8x8_t). I want to assign the returned value to a plain uint16x8_t. I see no appropriate vreinterpretq intrinsic to achieve that, and simple casts are rejected.

Just discovered in a manual that uint8x8x2_t is called a "Vector array data type". But still no idea how to convert. — , Dec 04 '12 at 20:46
Can you publish a small snippet to demonstrate your question? — auselen, Dec 07 '12 at 12:21

score 6 · Answer 1 · answered Dec 06 '12 at 00:07

6

Some definitions to answer clearly...

NEON has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide).

The NEON unit can view the same register bank as:

sixteen 128-bit quadword registers, Q0-Q15

thirty-two 64-bit doubleword registers, D0-D31.

uint16x8_t is a type which requires 128-bit storage thus it needs to be in an quadword register.

ARM NEON Intrinsics has a definition called vector array data type in ARM® C Language Extensions:

... for use in load and store operations, in table-lookup operations, and as the result type of operations that return a pair of vectors.

vzip instruction

... interleaves the elements of two vectors.

vzip Dd, Dm

and has an intrinsic like

uint8x8x2_t vzip_u8 (uint8x8_t, uint8x8_t)

from these we can conclude that uint8x8x2_t is actually a list of two random numbered doubleword registers, because vzip instructions doesn't have any requirement on order of input registers.

Now the answer is...

uint8x8x2_t can contain non-consecutive two dualword registers while uint16x8_t is a data structure consisting of two consecutive dualword registers which first one has an even index (D0-D31 -> Q0-Q15).

Because of this you can't cast vector array data type with two double word registers to a quadword register... easily.

Compiler may be smart enough to assist you, or you can just force conversion however I would check the resulting assembly for correctness as well as performance.

answered Dec 06 '12 at 00:07

auselen

27,577
7
73
114

Things are clearer now, thanks. uint8x8x2_t must be a "pseudo-type", as it describes possibly non-contiguous data. But my initial question remains: "you can just force conversion". How ??? – Dec 06 '12 at 17:03
But as i said, compiler can fix that for you. Problem is performance might suffer. You should check the binary. – auselen Dec 06 '12 at 17:54
My problem was I couldn't find a way to make the compiler accept a cast. – Dec 07 '12 at 09:44
I thought you did as in your own answer. – auselen Dec 07 '12 at 10:31
The fact is that I don't trust this solution, which could be working "by accident". I'll have to validate it deeper... – Dec 07 '12 at 11:53
In the guide you refer to (Section 12.2.2), int16x4x2_t is defined as struct int16x4x2_t { int16x4_t val[2]; };. When designing an alternative casting operator, is it correct to treat the 2 arrays as if they were in consecutive memory locations (since they are declared to be stored in an array of arrays)? – Antonio Mar 25 '15 at 14:52
@Antonio Afaik this is mostly for casting registers how things layout in memory might be a completely different story. Check VLDM instruction. – auselen Mar 25 '15 at 16:09
Could you please take a look at [this](http://stackoverflow.com/questions/29208668/using-an-union-encapsulated-in-a-struct-to-bypass-conversions-for-neon-data-ty/29213705#29213705) and see if it makes any sense? – Antonio Mar 25 '15 at 16:50

score 4 · Answer 2 · answered Sep 22 '14 at 21:34

4

You can construct a 128 bit vector from two 64 bit vectors using the vcombine_* intrinsics. Thus, you can achieve what you want like this.

#include <arm_neon.h>

uint8x16_t f(uint8x8_t a, uint8x8_t b)
{
    uint8x8x2_t tmp = vzip_u8(a,b);
    uint8x16_t result;
    result = vcombine_u8(tmp.val[0], tmp.val[1]);
    return result;
}

answered Sep 22 '14 at 21:34

Charles Baylis

851
7
8

Interesting, thanks. I'll look into it (but most probably not soon, this is an sleeping project). – Sep 23 '14 at 07:01

score 1 · Answer 3 · answered Dec 05 '12 at 07:25

1

I have found a workaround: given that the val member of the uint8x8x2_t type is an array, it is therefore seen as a pointer. Casting and deferencing the pointer works ! [Whereas taking the address of the data raises an "address of temporary" warning.]

uint16x8_t Value= *(uint16x8_t*)vzip_u8(arg0, arg1).val;

It turns out that this compiles and executes as should (at least in the case I have tried). I haven't looked at the assembly code so I cannot grant it is implemented properly (I mean just keeping the value in a register instead of writing/read to/from memory.)

answered Dec 05 '12 at 07:25

I am no expert, but I think that will [violate strict aliasing rules](http://stackoverflow.com/a/29253236/2436175). Plus the risk [mentioned by auselan](http://stackoverflow.com/a/13734838/2436175) that the 2 vectors of vectors are not coupled in the same 128bit register. – Antonio Mar 25 '15 at 12:01
From a pure declarative point of view, the two vectors in the struct must be contiguous, and this is innocuous. So we should assume that the compiler will handle the appropriate transfers between memory and registers. – Mar 25 '15 at 14:24
You are right on this point. Yet, technically there's a strict aliasing rules violation. – Antonio Mar 25 '15 at 14:53

score 0 · Answer 4 · edited May 23 '17 at 12:02

0

I was facing the same kind of problem, so I introduced a flexible data type.

I can now therefore define the following:

typedef NeonVectorType<uint8x16_t> uint_128bit_t; //suitable for uint8x16_t, uint8x8x2_t, uint32x4_t, etc.
typedef NeonVectorType<uint8x8_t> uint_64bit_t; //suitable for uint8x8_t, uint32x2_t, etc.

edited May 23 '17 at 12:02

Community

1
1

answered Mar 24 '15 at 10:04

Antonio

19,451
13
99
197

score -1 · Answer 5 · answered Sep 04 '13 at 16:25

-1

Its a bug in GCC(now fixed) on 4.5 and 4.6 series.

Bugzilla link http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48252

Please take the fix from this bug and apply to gcc source and rebuild it.

answered Sep 04 '13 at 16:25

BHS

991
3
12
26

Hi. Thanks for contributing to this post. It does not address the current issue, which is a matter of syntactic/semantic incompatibility, and not a code generation bug in the compiler, though. – Sep 05 '13 at 09:18

Data type compatibility with NEON intrinsics

5 Answers5

Linked