0

I've isolated the fact that I cannot branch link to putchar no matter how hard I try.

Even two lines alone like

mov r0,$48
bl putchar

Will always segfault when I'm expecting it to print ASCII 0

I can branch to putchar, and it will work, but I cannot branch link. Meaning

mov r0,$48
b putchar

will work

I feel like I'm missing something incredibly basic, but I cannot figure out why. I can only assume it has something to do with the return from putchar, but I have no idea what.

Sorry if this seems like a dumb question, but I honestly could not find a resource on this.

Edit: Although the above statements are true for even a standalone program for me, I am ultimately implementing this in a subroutine, which I figured might be important

Josh Hu
  • 65
  • 1
  • 8
  • 2
    You didn't give us much. A blind guess from me - you don't save LR where you need to, so your code bombs out somewhere upon an attempt to return from a subroutine. – tum_ Apr 16 '19 at 05:48
  • Please show us your whole program. The error is likely somewhere else in your code. – fuz Apr 16 '19 at 11:57
  • what processor/architecture? – old_timer Apr 16 '19 at 13:26

2 Answers2

2

This is difficult to say because you did not provide enough code, but you may be missing the code required for being compliant with the ARM calling conventions.
The complete code should save fp, lr on the stack, than call putchar, then restore fp, lr and return or restore fp, pc, which is basically the same.

Create a file named example.s with the following content:

        .arch armv7-a
        .align  2
        .globl main
        .arch armv7-a
        .syntax unified
        .arm
main:
         push    {fp, lr}
         mov     r0, #48
         bl      putchar
         pop     {fp, pc}

Compile and link it - I compiled a static version because I tested with qemu-arm:

/opt/arm/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf/bin/arm-linux-gnueabihf-gcc -static -O0 -o example  example.s

Execute it - using qemu-arm in my case

/opt/qemu-3.1.0-static/bin/qemu-arm example
0

Please note that:

pop     {fp, pc}

is equivalent to:

pop     {fp, lr}
ret

I hope this help.

Update

putchar() does return either the character that was passed or EOF in r0. Since r0 is not being modified in main, the value it does contain will be returned to the callee, i.e. bash, and can been see using the echo $? command:

opt/qemu-3.1.0/bin/qemu-arm example
0
echo $?
48

According to page 15 of ARM calling conventions, r4-r8 are being preserved across subroutine calls, but r0-r3 may not be.

Using objdump for disassembling the example program:

/opt/arm/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf/bin/arm-linux-gnueabihf-objdump -D example > example.lst

In example.lst, you can see that putchar() is:
1) preserving r4, r5, r6, r7, r8, lr according the the ARM Calling Convention,
2) making use of the registers you are mentioning as having been modified:

00016f50 <putchar>:
   16f50:   e92d41f0    push    {r4, r5, r6, r7, r8, lr}
   16f54:   e30354a8    movw    r5, #13480  ; 0x34a8
   16f58:   e3405008    movt    r5, #8
   16f5c:   e1a06000    mov r6, r0
   16f60:   e5954000    ldr r4, [r5]
   16f64:   e5943000    ldr r3, [r4]
   16f68:   e3130902    tst r3, #32768  ; 0x8000
   16f6c:   1a000015    bne 16fc8 <putchar+0x78>
   16f70:   e5943048    ldr r3, [r4, #72]   ; 0x48
   16f74:   ee1d7f70    mrc 15, 0, r7, cr13, cr0, {3}
   16f78:   e2477d13    sub r7, r7, #1216   ; 0x4c0
   16f7c:   e5932008    ldr r2, [r3, #8]
   16f80:   e1520007    cmp r2, r7
   16f84:   0a000030    beq 1704c <putchar+0xfc>
   16f88:   e3a02001    mov r2, #1
   16f8c:   e1931f9f    ldrex   r1, [r3]
   16f90:   e3510000    cmp r1, #0
   16f94:   1a000003    bne 16fa8 <putchar+0x58>
   16f98:   e1830f92    strex   r0, r2, [r3]
   16f9c:   e3500000    cmp r0, #0
   16fa0:   1afffff9    bne 16f8c <putchar+0x3c>
   16fa4:   f57ff05b    dmb ish
   16fa8:   1a00002d    bne 17064 <putchar+0x114>
   16fac:   e5943048    ldr r3, [r4, #72]   ; 0x48
   16fb0:   e5950000    ldr r0, [r5]
   16fb4:   e5837008    str r7, [r3, #8]
   16fb8:   e5932004    ldr r2, [r3, #4]
   16fbc:   e2822001    add r2, r2, #1
   16fc0:   e5832004    str r2, [r3, #4]
   16fc4:   ea000000    b   16fcc <putchar+0x7c>
   16fc8:   e1a00004    mov r0, r4
   16fcc:   e5903014    ldr r3, [r0, #20]
   16fd0:   e6efc076    uxtb    ip, r6
   16fd4:   e5902018    ldr r2, [r0, #24]
   16fd8:   e1530002    cmp r3, r2
   16fdc:   32832001    addcc   r2, r3, #1
   16fe0:   35802014    strcc   r2, [r0, #20]
   16fe4:   35c36000    strbcc  r6, [r3]
   16fe8:   2a000019    bcs 17054 <putchar+0x104>
   16fec:   e5943000    ldr r3, [r4]
   16ff0:   e3130902    tst r3, #32768  ; 0x8000
   16ff4:   1a000005    bne 17010 <putchar+0xc0>
   16ff8:   e5940048    ldr r0, [r4, #72]   ; 0x48
   16ffc:   e5903004    ldr r3, [r0, #4]
   17000:   e2433001    sub r3, r3, #1
   17004:   e5803004    str r3, [r0, #4]
   17008:   e3530000    cmp r3, #0
   1700c:   0a000001    beq 17018 <putchar+0xc8>
   17010:   e1a0000c    mov r0, ip
   17014:   e8bd81f0    pop {r4, r5, r6, r7, r8, pc}
   17018:   e5803008    str r3, [r0, #8]
   1701c:   f57ff05b    dmb ish
   17020:   e1902f9f    ldrex   r2, [r0]
   17024:   e1801f93    strex   r1, r3, [r0]
   17028:   e3510000    cmp r1, #0
   1702c:   1afffffb    bne 17020 <putchar+0xd0>
   17030:   e3520001    cmp r2, #1
   17034:   dafffff5    ble 17010 <putchar+0xc0>
   17038:   e3a01081    mov r1, #129    ; 0x81
   1703c:   e3a02001    mov r2, #1
   17040:   e3a070f0    mov r7, #240    ; 0xf0
   17044:   ef000000    svc 0x00000000
   17048:   eafffff0    b   17010 <putchar+0xc0>
   1704c:   e1a00004    mov r0, r4
   17050:   eaffffd8    b   16fb8 <putchar+0x68>
   ...
Frant
  • 5,382
  • 1
  • 16
  • 22
  • Okay so this did work in resolving the segfault, but I have a small follow up question I was hoping you could answer. To my understanding, putchar only returns the char written or EOF, which should just be in r0 yes? For some reason, when I branch linked put char it seemed to overwrite the values in some of the lower registers like r2 and r3, but not r5 and up. Is putchar using these registers, or is there another reason I'm not seeing? – Josh Hu Apr 16 '19 at 16:20
  • @Josh Hu: I augmented the answer above upon reading your comment. – Frant Apr 16 '19 at 17:08
  • Wow thanks! Unfortunately I don't have enough reputation to visibly increase the score of this response, but it was very helpful and enlightening – Josh Hu Apr 16 '19 at 17:14
  • @Josh Hu: no problem, I am glad it helped. – Frant Apr 16 '19 at 18:46
  • if the individual function is not modifying the fp then you dont need to. also this is not fully compatible with arm instruction sets, so we still need to know what instruction set yes? – old_timer Apr 16 '19 at 20:33
  • @old_timer: I see your point, but don't you think this is safer than to follow the ABI rules when you are calling code you don't know anything about in terms of implementation, such glibc ? what do you mean by not 'fully compatible with arm instruction sets' ? the skeleton for example.s was created using `arm-linux-gnueabihf-gcc -O0 -S` from an empty main function: I would say that the generated code was standard armv7-a / 32 bits ARM code (`.armv7-a` and `.arm` directives) - thanks. – Frant Apr 16 '19 at 21:17
  • nowhere does it say this is armv7-a and armv7-a compilers generate thumb not arm by default right? this could be any of the architectures less than armv8, wasnt specified. – old_timer Apr 16 '19 at 22:48
  • but there is thumb code that could be used as an example that works on all architectures from armv4t up to armv7a and armv8m. and then not knowing the architecture doesnt matter. can still easily conform to the abi. – old_timer Apr 16 '19 at 22:50
  • @old_timer: I agree the exact target architecture is still unknown. Not sure for the default being thumb though: The gcc 8.3 I used comes directly from ARM, does dot support armv6 and older architectures, but does generate 32 bits ARM code by default. Thumb would have been a good choice, yes, I just thought that a segmentation violation meant Linux and more likely ARM 32, even though this could have been thumb2 as well. If my gcc had generated thumb2 code, I would have used it in the example - just did not think about it... Thanks for the comments. – Frant Apr 17 '19 at 00:33
  • where I was headed but didnt get there is fp doesnt need to be saved in your example, any more than r4 etc as you dont modify them. for an arm instruction fp is fine but for thumb it can be a problem. thats where I was headed. either way the op is probably not preserving lr. If that push/pop doesnt work then no doubt we will get another question or comment. – old_timer Apr 17 '19 at 02:45
0

Let the compiler guide you. Compilers are not perfect, but if we assume debugged then their output works.

void next ( char );
void fun ( void )
{
    next(0x33);
}

00000000 <fun>:
   0:   e92d4010    push    {r4, lr}
   4:   e3a00033    mov r0, #51 ; 0x33
   8:   ebfffffe    bl  0 <next>
   c:   e8bd4010    pop {r4, lr}
  10:   e12fff1e    bx  lr

Not linked but shows that lr needs to be preserved. R4 is there to keep the stack aligned on a 64 bit boundary, depending on the version of your compiler it may predate this rule and will only push lr and pop it.

Maximum compatibility, oldest thumb instructions, work on armv4t through to armv7a and armv8m.

00000000 <fun>:
   0:   b510        push    {r4, lr}
   2:   2033        movs    r0, #51 ; 0x33
   4:   f7ff fffe   bl  0 <next>
   8:   bc10        pop {r4}
   a:   bc01        pop {r0}
   c:   4700        bx  r0

(This does conform to the stack alignment, by the way).

As you figured out you can branch there:

00000000 <fun>:
   0:   2033        movs    r0, #51 ; 0x33
   2:   f7ff bffe   b.w 0 <next>

but in this case a tail optimization. If you want to branch link and return this won't work for you.

int next ( char );
int fun ( char a )
{
    return(next(a)+1);
}

00000000 <fun>:
   0:   b508        push    {r3, lr}
   2:   f7ff fffe   bl  0 <next>
   6:   3001        adds    r0, #1
   8:   bd08        pop {r3, pc}

at some point pop supported interworking with a pop {pc} not original thumb but later yes. like r4 above r3 here is simply for 64 bit stack alignment many of the lower registers would have worked here, it's a don't-care kind of thing.

The other reason why we need to know the architecture is whether or not you can/should use bl or not as bl doesn't work between modes, but blx does if supported by your architecture.

00001000 <fun>:
    1000:   b508        push    {r3, lr}
    1002:   f000 e804   blx 100c <next>
    1006:   3001        adds    r0, #1
    1008:   bd08        pop {r3, pc}
    100a:   bf00        nop

0000100c <next>:
    100c:   e2800002    add r0, r0, #2
    1010:   e12fff1e    bx  lr



00001000 <fun>:
    1000:   e92d4010    push    {r4, lr}
    1004:   eb000003    bl  1018 <__next_from_arm>
    1008:   e8bd4010    pop {r4, lr}
    100c:   e2800001    add r0, r0, #1
    1010:   e12fff1e    bx  lr

00001014 <next>:
    1014:   3002        adds    r0, #2
    1016:   4770        bx  lr

00001018 <__next_from_arm>:
    1018:   e59fc000    ldr ip, [pc]    ; 1020 <__next_from_arm+0x8>
    101c:   e12fff1c    bx  ip
    1020:   00001015    andeq   r1, r0, r5, lsl r0
    1024:   00000000    andeq   r0, r0, r0

In both cases the linker fixed problems, but older versions of gnu do not do this they just build bad code, and even more recent ones will build bad code if you are not paying attention. Depends on how lucky you are that day. So be very very careful using the bl instruction.

I assume we all believe the problem is that you didn't read that bl modifies lr which means if you want to return from the function that is using bl you need to save that return address and not destroy it.

halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168