I want to implement a code in assembly instruction using both ARM assembly instruction and ASIMD instructions in parallel. My first question is, whether this is can be done on ARMv8? Based on this thread, it's possible on ARMv7, however data transfer between NEON and ARM registers takes considerable amount of time. Second, I am looking for a way that I can implement my assembly code in parallel. Here is what I am trying to do:
.
.
.
<ASIMD instruction>
<ASIMD instruction>
<ASIMD instruction>
<Data MOV between ASIMD vectors and ARM Reg>
<ARM assembly instruction> ------- <ASIMD instruction>
<ARM assembly instruction> ------- <ASIMD instruction>
<ARM assembly instruction> ------- <ASIMD instruction>
<Data MOV between ARM Reg and ASIMD vectors>
<ARM assembly instruction> ------- <ASIMD instruction>
<ARM assembly instruction> ------- <ASIMD instruction>
<ARM assembly instruction> ------- <ASIMD instruction>
.
.
.
I am wondering if I can do this using two threads. I am working on ARM-CortexA53 microprocessor. I also have access to ARM-CortexA57, but I think these platforms are roughly the same and they have equal capabilities.