4

I am running a program on a bare-metal ARM (v5TE-compliant) with a JTAG connector and gdb. The program runs from some SDRAM in supervisor mode, and uses only arm instructions.

At some point an exception occurs. Stopping gdb with ctrl+C I can see that the CPSR indicates an undefined exception mode, however the program counter indicates a software interrupt exception (0xffff0008). According to the ARM ARM, when an undefined instruction exception occurs, the PC_und should be 0xffff0004 or 0x00000004. What's happening to my program, did a SWI happen or an undefined instruction exception?

edit to make my question clearer:

My program purpose is to test the hardware of the custom board. When there is a hardware problem, there can be a corruption from the program in RAM (as can be seen below) which is the cause of the exception generated. When the hardware is normal the test software runs without problem. My RAM addresses range from 0 to 0x40000000, the program is loaded between 0x1000 and 0x2000. The supervisor mode stack pointer is set to 0xff0. The interruption vector consists only of breakpoints.

(gdb) c
Continuing.
^C^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0xffff0008 in ?? ()

Registers from the undefined exception mode:

(gdb) i r 
r0             0x52878  338040
r1             0x2020000    33685504
r2             0x2020000    33685504
r3             0x2020000    33685504
r4             0x2020000    33685504
r5             0x2020000    33685504
r6             0x2020000    33685504
r7             0x2020000    33685504
r8             0x2020000    33685504
r9             0x2020000    33685504
r10            0x2020000    33685504
r11            0x2020000    33685504
r12            0x2020000    33685504
sp             0x2020000    0x2020000
lr             0xffff0008   4294901768
pc             0xffff0008   0xffff0008
fps            0x0  0
cpsr           0x800000db   2147483867

Registers from the supervisor mode:

(gdb) set $cpsr=0xd3
(gdb) i r
r0             0x52878  338040
r1             0x2020000    33685504
r2             0x2020000    33685504
r3             0x2020000    33685504
r4             0x2020000    33685504
r5             0x2020000    33685504
r6             0x2020000    33685504
r7             0x2020000    33685504
r8             0x2020000    33685504
r9             0x2020000    33685504
r10            0x2020000    33685504
r11            0x2020000    33685504
r12            0x2020000    33685504
sp             0xff3ffffe   0xff3ffffe
lr             0x1020   4128
pc             0xffff0008   0xffff0008
fps            0x0  0
cpsr           0xd3 211 

Here is the (corrupted)program in RAM around the address pointed by the supervisor link register:

(gdb) x/5i 0x1020-8
0x1018 <_start+24>: bic r0, r0, #135168 ; 0x21000
0x101c <_start+28>: strbcs  r0, [r0], #1025
0x1020 <_start+32>: mcr 15, 0, r0, cr1, cr0, {0}
0x1024 <_start+36>: ldr r1, [pc, #120]  ; 0x10a4 <skip_intreg_reset+100>
0x1028 <_start+40>: ldr r2, [r1, #8]

(gdb) x/4w 0x1018
0x1018 <_start+24>: 0xe3c00a01  
0x101C <_start+28>: 0xfec00401
0x1020 <_start+32>: 0xee010f10  
0x1024 <_start+36>: 0xe59f1078

dump from the program object file:

  18:   e3c00a01    bic r0, r0, #4096   ; 0x1000
  1c:   e3c00001    bic r0, r0, #1  ; 0x1
  20:   ee010f10    mcr 15, 0, r0, cr1, cr0, {0}
  24:   e59f1078    ldr r1, [pc, #120]  ; a4 <skip_intreg_reset+0x64>
  28:   e5912000    ldr r2, [r1]
Étienne
  • 4,773
  • 2
  • 33
  • 58
  • was it in thumb mode or arm mode when the swi occurred? – old_timer Mar 15 '13 at 13:39
  • can you dump/post the code where the exception occurred? – old_timer Mar 15 '13 at 13:43
  • Could be just a GDB glitch. Try setting breakpoint on the target of the branch from the vector, not on the vector itself. – Igor Skochinsky Mar 15 '13 at 14:27
  • What do you mean with branch from the vector? To use a branch instruction in the interruption vector and put a breakpoint where I branch? – Étienne Mar 15 '13 at 14:36
  • Could you show dump of the code you have for exception vectors? – Joe Kul Mar 16 '13 at 02:19
  • The SDRAM ranges from 0 to 0x4000.0000. My stack is set to 0xFF0, and the program is loaded between 0x1000 and 0x3000. I can print the program but it is simply some random corrupted bits since it is a hardware problem, my program goal is to test if the hardware is OK or not. My exception vector is nothing more than breakpoints. – Étienne Mar 17 '13 at 22:07
  • 1
    you have verified the mcr is a valid operation with that coprocessor on this core? replace it with a nop perhaps. And the ldr is on an aligned boundary? (if 0x2020000 then its okay) – old_timer Mar 18 '13 at 21:32
  • The program runs without problem on the boards which have no hardware problems, the mcr operation is valid. – Étienne Mar 18 '13 at 22:03
  • so you have bad ram and want to figure out what ram is bad? or what hardware problem are you talking about, and what are you trying to solve with the debugger? – old_timer Mar 19 '13 at 01:28
  • @dwelch: Exactly I want to find which RAM/connection on the boards are bad. I have multiple identical boards and some of them have a problem, which I strongly suspect to be a RAM problem, the only goal of the software is to test them. – Étienne Mar 19 '13 at 08:56
  • have you no flash from which to run a memory test to find the bad ram, rather than try to execute into bad ram? Or since you have jtag use jtag to test the ram to find the problem. – old_timer Mar 19 '13 at 13:42
  • I found the problem, actually the exception vectors address was configured high by the processor at reset. The exception vector full of breakpoints I loaded at address 0x0 was never used, the exception vector used was in flash: '0xffff0000: b 0xffff0020 0xffff0004: b 0xffff0004 0xffff0008: b 0xffff0008 0xffff000c: b 0xffff000c 0xffff0010: b 0xffff0010 0xffff0014: b 0xffff0014 0xffff0018: b 0xffff0018 0xffff001c: b 0xffff001c` – Étienne Mar 19 '13 at 16:58
  • that 0xFFFF0008 address did look strange (and did look like high exception vectors)... – old_timer Mar 21 '13 at 20:37

1 Answers1

4

This is a community wiki answer.

The issue was caused by two different problems:

  • The wrong vector table was being initialized. The ARM has selectable high and low vectors and high 0xffff0000 was the default, whereas the code was initialized as if the vector table was at 0x00000000. The high vector table contained the following instructions (infinite loops on exceptions):
0xffff0000: b 0xffff0020 
0xffff0004: b 0xffff0004 
0xffff0008: b 0xffff0008 
0xffff000c: b 0xffff000c 
0xffff0010: b 0xffff0010 
0xffff0014: b 0xffff0014 
0xffff0018: b 0xffff0018 
0xffff001c: b 0xffff001c
  • The SDRAM issues on the board caused the program content in RAM to be corrupted and to generate undefined exceptions. Following that the program stopped responding as it was in an infinite loop and the OP stopped gdb. The JTAG debugger used (peedi) actually jumps to the next instruction when gdb is stopped with ctrl+C, that's why the pc was 0xffff0008 even though the cpsr indicated an undefined exception situated at 0xffff0004.
Étienne
  • 4,773
  • 2
  • 33
  • 58
artless noise
  • 21,212
  • 6
  • 68
  • 105