I am new to neon intrinsics. I have two arrays containing 99 elements which I am trying to add them element wise using neon intrinsic. As 99 is not a multiple of 8,16 or 32. 96 elements can be handled how to handle the remaining 3 elements. please help here is the code that I have written
#include <arm_neon.h>
#define SIZE 99
void addition(unsigned char A[],unsigned char B[],unsigned short int *addres)
{
uint8x8_t v,v1;
int i=0;
for (i=0;i<SIZE;i=i+8){
v = vld1_u8(&A[i]); // load the array from memory into a vector
v1=vld1_u8(&B[I]);
uint16x8_t t = vaddl_u8(v,v1);
vst1q_u16(addres+i,t); // store the vector back to memory
}
}