I wanted to learn Neon. I took the example on the ARM website at:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0425/ch04s06s05.html
I thought I would get this to run and then start to experiment with it. KISS. The program compiles fine (GCC), however, when run I get a 'Segmentation fault' when the first VST instruction is encountered. Remove the VST instructions and the program goes its course. Using GDB everything seems to be working (register values etc), just the error it seems when it comes to the store-in-memory process.
Appreciate any guidance or help...
.global main
.func main
main:
.macro mul_col_f32 res_q, col0_d, col1_d
vmul.f32 \res_q, q8, \col0_d[0] @ multiply col element 0 by matrix col 0
vmla.f32 \res_q, q9, \col0_d[1] @ multiply-acc col element 1 by matrix col 1
vmla.f32 \res_q, q10, \col1_d[0] @ multiply-acc col element 2 by matrix col 2
vmla.f32 \res_q, q11, \col1_d[1] @ multiply-acc col element 3 by matrix col 3
.endm
LDR R0, =result0a
LDR R1, =result1a
LDR R2, =result2a
vld1.32 {d16-d19}, [r1]! @ load the first eight elements of matrix 0
vld1.32 {d20-d23}, [r1]! @ load the second eight elements of matrix 0
vld1.32 {d0-d3}, [r2]! @ load the first eight elements of matrix 1
vld1.32 {d4-d7}, [r2]! @ load the second eight elements of matrix 1
mul_col_f32 q12, d0, d1 @ matrix 0 * matrix 1 col 0
mul_col_f32 q13, d2, d3 @ matrix 0 * matrix 1 col 1
mul_col_f32 q14, d4, d5 @ matrix 0 * matrix 1 col 2
mul_col_f32 q15, d6, d7 @ matrix 0 * matrix 1 col 3
vst1.32 {d24-d27}, [r0]! @ store first eight elements of result.
vst1.32 {d28-d31}, [r0]! @ store second eight elements of result.
MOV R7, #1
SWI 0
result1a: .word 0xFFFFFFFF @ d16
result1b: .word 0xEEEEEEEE @ d16
result1c: .word 0xDDDDDDDD @ d17
result1d: .word 0xCCCCCCCC @ d17
result1e: .word 0xBBBBBBBB @ d18
result1f: .word 0xAAAAAAAA @ d18
result1g: .word 0x99999999 @ d19
result1h: .word 0x88888888 @ d19
result2a: .word 0x77777777 @ d0
result2b: .word 0x66666666 @ d0
result2c: .word 0x55555555 @ d1
result2d: .word 0x44444444 @ d1
result2e: .word 0x33333333 @ d2
result2f: .word 0x22222222 @ d2
result2g: .word 0x11111111 @ d3
result2h: .word 0x0F0F0F0F @ d3
result0a: .word 0x0 @ R0
result0b: .word 0x0 @ R0
result0c: .word 0x0 @ R0
result0d: .word 0x0 @ R0
result0e: .word 0x0 @ R0
result0f: .word 0x0 @ R0
result0g: .word 0x0 @ R0
result0h: .word 0x0 @ R0