0

I have ARM assembly source code:

    .global _start
    .text

entry:  b _start            
array:  .byte 10, 20, 25    
eoa:                        

    .align                  

_start:     

    ldr r0, =eoa            
    ldr r1, =array        
    mov r3, #0            

loop:   

    ldrb r2, [r1], #1                           
    add r3, r2, r3                                     
    cmp r1, r0                                          
    bne loop                

stop:   b stop       

Which is a simple sum of array. So now I compile this using this sequence of commands in my Linux terminal:

arm-none-eabi-as -o program.o program.s
arm-none-eabi-ld -Ttext=0x0 -o program.elf program.o
arm-none-eabi-objcopy -O binary program.elf program.bin

And if I check size of compiled files:

ls -l
-rwxrwxrwx 1 ziga ziga       48 Nov 22 12:38 program.bin
-rwxrwxrwx 1 ziga ziga    66396 Nov 22 12:37 program.elf
-rwxrwxrwx 1 ziga ziga      864 Nov 22 12:36 program.o
-rwxrwxrwx 1 ziga ziga     1952 Jan  3 18:50 program.s

I see that if I strip header from my executable .elf i get .bin file which is exactly 48 Bytes long. This means it can have 12 ARM instructions right?

Now I prepare the 16KiB FLASH image, copy the .bin file to the FLASH image and start QEMU simulation on the connex board like this:

dd if=/dev/zero of=flash.bin bs=4096 count=4096
dd if=program.bin of=flash.bin conv=notrunc
qemu-system-arm -M connex -nographic -serial /dev/null -pflash flash.bin 

While in simulator if I check registers using info registers, I get:

R00=00000007 R01=00000007 R02=00000019 R03=00000037
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=00000000 R14=00000000 R15=00000024
PSR=600001d3 -ZC- A svc32
FPSCR: 00000000

Which looks fine as register R03 holds hexadecimal value 0x37 which is 55 in decimal and it is the correct sum of the array I supplied with a command .byte 10, 20, 25.

What I don't understand is if I dump first 12 commands using xp /12wi 0x0, I get this:

0x00000000:  ea000000      b    0x8
0x00000004:  0019140a      andseq   r1, r9, sl, lsl #8
0x00000008:  e59f0018      ldr  r0, [pc, #24]   ; 0x28
0x0000000c:  e59f1018      ldr  r1, [pc, #24]   ; 0x2c
0x00000010:  e3a03000      mov  r3, #0  ; 0x0
0x00000014:  e4d12001      ldrb r2, [r1], #1
0x00000018:  e0823003      add  r3, r2, r3
0x0000001c:  e1510000      cmp  r1, r0
0x00000020:  1afffffb      bne  0x14
0x00000024:  eafffffe      b    0x24
0x00000028:  00000007      andeq    r0, r0, r7
0x0000002c:  00000004      andeq    r0, r0, r4

How can I justify commands 2-4 and 11-12 to myself? And how the he** is my sum 55 even calculated?

71GA
  • 1,132
  • 6
  • 36
  • 69
  • 4
    what part dont you understand? at offset 0x04 is a data value, the disassembler just disassembles what it sees so you ignore that, likewise the last two andeqs those are also bogus disassemblys of data values. hte instruction at 0x0C is reading the ADDRESS 0x00000004 then the instruction at 0x14 is loading the byte data at 0x00000004 and your loop goes until it finds a 0x07 in the string? – old_timer Jan 03 '18 at 18:14
  • it implemented the code you asked for, did you not understand the code you wrote? very confused as to what you are asking. – old_timer Jan 03 '18 at 18:15
  • for each that you dont understand add text to your question, this is the instruction and this is what I wanted it to do but dont understand what it is doing. – old_timer Jan 03 '18 at 18:17
  • Your comments proved quite helpful! It is actually data there and data sometimes looks like a rubbish if you want to interpret it like an ARM instructions! Thank you! – 71GA Jan 03 '18 at 19:32
  • This is just one simple example in GCC where I am still avoiding the usage of .data section. But I understand the sections defined in the linker script. – 71GA Jan 03 '18 at 20:05
  • Can anyone help me to explain how instruction `ldr r0, [pc, #24]` loads value from address `0x28` to register `r0`? I thought value of `pc` at the time this instruction is executed is `0xC` and if I add `24 == 0x18` to it I get `0x24` and not `0x28`. `pc` should hold the address of next instruction to be fetched... – 71GA Jan 03 '18 at 20:14
  • 2
    On ARM, `pc` points two instructions ahead. – Jester Jan 03 '18 at 22:21
  • Is it because of the 3-stage pipeline or is there any other reason? – 71GA Jan 03 '18 at 22:23
  • 1
    it is because perhaps the arm1 had a three staged pipeline, today it is a compatibility thing independent of the pipeline design. It is just how it works by definition (imagine if the compiler and hand written asm had to change for every core they produce). Other architectures the offset is relative to the end of the instruction in question which could vary in length and look equally as strange as this. note in thumb mode it is also two instructions ahead (4 bytes) rather than arm mode 8 bytes. – old_timer Jan 04 '18 at 02:29

2 Answers2

2

ldr r0, =eoa and ldr r1, =array are a pseudo instruction as they do not exist in the ARM instruction set. They cannot directly be converted into ARM assembler. When the compiler see these instructions, it converts them into whatever it is most efficient.

If you look at your disassembly:

0x00000008: e59f0018 ldr r0, [pc, #24] ; 0x28 0x0000000c: e59f1018 ldr r1, [pc, #24] ; 0x2c (...) 0x00000028: 00000007 andeq r0, r0, r7 0x0000002c: 00000004 andeq r0, r0, r4

You can see the compiler the base address of you array (ie: 0x4 because it is 32bit aligned) and the end of the array (ie: base address of the array + 3 bytes = 0x7) respectively at the offset 0x28 (pc before 0x8 + 0x24) and 0x2c (pc before 0xc + 0x24).

Note: It is also possible ldr rN, =#immediate be encoded into a single assembly line. ARM instruction set support a 16bit immediate value (ie: from 0 to 65535). See page 2 of ARM instruction set. The compiler will try to use the most efficient encoding.

OlivierM
  • 2,820
  • 24
  • 41
  • Thank you for this explanation! It further deepens my knowledge and deserves to be accepted answer! – 71GA Jan 08 '18 at 22:00
  • So in this case **ARM pseudo command is split into two parts**?! First part is ARM instruction `ldr r0, [pc, #24]` located at `0x8` which points to second part - the constant `0x7` located at `0x28`... Interesting. I found one more interesting example, but I will open another topic about that one. – 71GA Jan 09 '18 at 08:21
  • 1
    It is also possible `ldr rN, =#immediate` be encoded into a single assembly line. ARM instruction set support a 16bit immediate value (ie: from 0 to 65535). See `mov` instruction at page 2 of [ARM instruction set](http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf). The compiler will try to use the most efficient encoding. – OlivierM Jan 09 '18 at 12:01
  • Ah I see it would be smart to use immediate value here yes! Thank you for the reference to the Arm instruction set! It is nice to see this kind of valuable data there as well! – 71GA Jan 09 '18 at 12:10
  • ARM immediates go through the barrel shifter; it's not just `0..65535` that you can set up with one 32-bit instruction. What matters is the distance between the lowest and highest set bit. (Or unset bit for `mvn`). http://www.davespace.co.uk/arm/introduction-to-arm/immediates.html. Only 12 of the 16 bits are used directly, the other 4 bits are a rotate count (multiplied by 2). And that's in ARM mode. Thumb2 mode has fewer immediate bits. – Peter Cordes Jan 10 '18 at 02:38
2

Actually replacing both ldr rN, =label by adr rN, label would be more efficient! ADR form PC-relative address using label. Your code would be built as:

00000000 <entry>:
   0:   eafffffe    b   8 <_start>

00000004 <array>:
   4:   140a        .short  0x140a
   6:   19          .byte   0x19

00000007 <eoa>:
    ...

00000008 <_start>:
   8:   e24f0009    sub r0, pc, #9
   c:   e24f1010    sub r1, pc, #16
  10:   e3a03000    mov r3, #0

00000014 <loop>:
  14:   e4d12001    ldrb    r2, [r1], #1
  18:   e0823003    add r3, r2, r3
  1c:   e1510000    cmp r1, r0
  20:   1afffffb    bne 14 <loop>

00000024 <stop>:
  24:   eafffffe    b   24 <stop>

Note: The only change I made to your original code was to replace:

ldr r0, =eoa            
ldr r1, =array 

By:

adr r0, eoa            
adr r1, array        
OlivierM
  • 2,820
  • 24
  • 41
  • 1
    This should probably be a second section in your original answer. – Peter Cordes Jan 09 '18 at 13:06
  • 1
    I also asked the question myself before posting it. But this second answer does not really answer the original question. And I thought separating both answers would make the message clearer. – OlivierM Jan 09 '18 at 14:04
  • Yes, exactly because it isn't a separate answer to the original question, it should be a "what you can do instead" section in your answer that does answer the question. When I've posted 2 answers to the same question, it was an AVX512 answer vs. an AVX2 answer, where they answer the same question using totally different instructions. https://stackoverflow.com/questions/36932240/avx2-what-is-the-most-efficient-way-to-pack-left-based-on-a-mask. – Peter Cordes Jan 10 '18 at 02:40