Questions tagged [neon]

NEON is a vector-processing instruction set for ARM processors. Please use this tag together with [arm] if asking about the AArch32 version of NEON (to run on 32-bit ARM processors), or [arm64] for AArch64. See also the [simd] tag.

NEON is a vector-processing instruction set for ARM processors. It's also known as Advanced SIMD (Single Instruction Multiple Data).

NEON can be used on either 32-bit or 64-bit ARM processors, as part of the AArch32 or AArch64 architectures respectively. However, there are significant differences between the AArch32 and AArch64 versions of NEON (register usage, instruction mnemonics, instruction availability), so please use this tag together with either for AArch32, or for AArch64.

The tag may also be appropriate, especially for questions about SIMD algorithms that may be implemented with NEON.

Don't forget to include a tag for the programming language you are coding in, perhaps , or . In the latter cases, consider the tags or for how you access the instructions.

More information at

  1. Neon page in ARM website
  2. Wikipedia article on ARM
885 questions
48
votes
4 answers

ARM Cortex-A8: Whats the difference between VFP and NEON

In ARM Cortex-A8 processor, I understand what NEON is, it is an SIMD co-processor. But is VFP(Vector Floating Point) unit, which is also a co-processor, works as a SIMD processor? If so which one is better to use? I read few links such as…
HaggarTheHorrible
  • 7,083
  • 20
  • 70
  • 81
31
votes
5 answers

Why ARM NEON not faster than plain C++?

Here is a C++ code: #define ARR_SIZE_TEST ( 8 * 1024 * 1024 ) void cpp_tst_add( unsigned* x, unsigned* y ) { for ( register int i = 0; i < ARR_SIZE_TEST; ++i ) { x[ i ] = x[ i ] + y[ i ]; } } Here is a neon version: void…
Smalti
  • 507
  • 1
  • 5
  • 13
29
votes
4 answers

Is there a good reference for ARM Neon intrinsics?

The ARM reference manual doesn't go into too much detail into the individual instructions ( http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348b/BABIIBBG.html ). Is there something that's a little more detailed?
Vineeth
  • 473
  • 1
  • 6
  • 8
22
votes
7 answers

Coding for ARM NEON: How to start?

I'm looking to optimize C++ code (mainly some for loops) using the NEON capability of computing 4 or 8 array elements at a time. Is there some kind of library or set of functions that can be used in C++ environment? I use Eclipse IDE in Linux Gentoo…
Pedro Batista
  • 1,100
  • 1
  • 13
  • 25
21
votes
3 answers

Cortex A9 NEON vs VFP usage confusion

I'm trying to build a library for a Cortex A9 ARM processor(an OMAP4 to be more specific) and I'm in a little bit of confusion regarding which\when to use NEON vs VFP in the context of floating point operations and SIMD. To be noted that I know the…
celavek
  • 5,575
  • 6
  • 41
  • 69
18
votes
4 answers

C vs assembler vs NEON performance

I am working on an iPhone application that does real time image processing. One of the earliest steps in its pipeline is to convert a BGRA image to greyscale. I tried several different methods and the difference in timing results is far greater…
Hammer
  • 10,109
  • 1
  • 36
  • 52
17
votes
3 answers

Android ARMv6/v7 and VFP/NEON

I would like to understand more the CPU used on Android phones. The reason is that we are building the C library which has the certain CPU/math processor architecture flags we can set. So far we have found that all Android devices CPUs are ARM…
STeN
  • 6,262
  • 22
  • 80
  • 125
16
votes
1 answer

Divide by floating-point number using NEON intrinsics

I'm processing an image by four pixels at the time, this on a armv7 for an Android application. I want to divide a float32x4_t vector by another vector but the numbers in it are varying from circa 0.7 to 3.85, and it seems to me that the only way to…
Darkmax
  • 187
  • 1
  • 9
16
votes
2 answers

gcc; arm64; aarch64; unrecognized command line option '-mfpu=neon'

I got compilation error: unrecognized command line option '-mfpu=neon'* when tried to compile with -mfpu=neon flag. Actually, any 'mfpu' options I tried failed. However in documentation this flag is mentioned, so it should be valid What is…
user3124812
  • 1,861
  • 3
  • 18
  • 39
16
votes
2 answers

Common SIMD techniques

Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex conditional branchless code. For example (ARMv6), the…
zxcat
  • 2,054
  • 3
  • 26
  • 40
16
votes
5 answers

Fast 4x4 Matrix Multiplication in C

I am trying to find an optimized C or Assembler implementation of a function that multiplies two 4x4 matrices with each other. The platform is an ARM6 or ARM7 based iPhone or iPod. Currently, I am using a fairly standard approach - just a little…
Till
  • 27,559
  • 13
  • 88
  • 122
16
votes
4 answers

Methods to vectorise histogram in SIMD?

I am trying to implement histogram in Neon. Is it possible to vectorise ?
Rugger
  • 373
  • 3
  • 10
15
votes
3 answers

what is the fastest FFT library for iOS/Android ARM devices?

What is the fastest FFT library for iOS/Android ARM devices? And what library to people typically use on iOS/Android platforms? I'm guessing vDSP is the library most frequently used on iOS. EDIT: my code is at http://anthonix.com/ffts and uses the…
Anthony Blake
  • 5,328
  • 2
  • 25
  • 24
15
votes
1 answer

ARM to C calling convention, NEON registers to save

There is a similar post that covers regular registers. What about NEON registers. As far as I remember either top half or bottom half of registers have to be preserved across function calls. I can't find that info anywhere, can somebody clarify…
Pavel P
  • 15,789
  • 11
  • 79
  • 128
15
votes
3 answers

How to use the multiply and accumulate intrinsics in ARM Cortex-a8?

how to use the Multiply-Accumulate intrinsics provided by GCC? float32x4_t vmlaq_f32 (float32x4_t , float32x4_t , float32x4_t); Can anyone explain what three parameters I have to pass to this function. I mean the Source and destination registers…
HaggarTheHorrible
  • 7,083
  • 20
  • 70
  • 81
1
2 3
58 59