I'm trying to do an FFT
->signal manipulation
->Inverse FFT
using Project NE10 in my CPP project and convert the complex output to amplitudes and phases for FFT and vice versa for IFFT. But the performance of my C++ code is not as good as the SIMD enabled NE10 code as per the benchmarks. Since I have no experience with arm assembly, I'm looking for some help to write neon code for the unoptimised C module. For example, before IFFT I do this:
for (int bin = 0; bin < NUM_FREQUENCY_BINS; bin++) {
input[bin].real = amplitudes[bin] * cosf(phases[bin]);
input[bin].imag = amplitudes[bin] * sinf(phases[bin]);
}
where input
is an array of C structs (for complex values), amplitudes
& phases
are float
arrays.
The above block (O(n) complexity)
takes about 0.6ms for 8192 bins while NE10 FFT (O(n*log(n)) complexity)
takes only 0.1ms because of SIMD operations. From what I've read so far on StackOverflow and other places, intrinsics are not worth the effort, so I'm trying in arm neon only.