1

I'm trying to iterate over a string in assembly, and change the lower case letters to upper case, and stop when the string is 0, but something seems very wrong (I seem to be missing a concept). I can't figure out what the problem is or what is going on.

Here is what I have so far:

Upper:
        movq    (%rbx), %rcx
        movq    $0, %rdi
        call    check
        call    fin
        add     %rdi, %rax
        ret

    fin:
        cmpb    $0x0, %r9b
        jne     check
        ret

    check:
        movb    (%rcx, %rdi), %r9b
        cmp     $'Z', %r9b
        jg      toUpper
        jmp     next

    toUpper:
        sub     %r9b, 0x20
        jmp     next

    next:
        incq    %rdi
        jmp     fin
1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
springday
  • 7
  • 4
  • 1
    Could you show how you call `Upper`? It does not seem to follow common calling conventions. – janw Mar 01 '20 at 10:10

1 Answers1

2

As it looks, your code is a bit convoluted, and it is hard to follow which algorithm you are trying to implement.

When approaching such a problem, it usually helps to write down the basic algorithm in C or pseudocode first:

  • For each char c
    • If c is a null byte: Done
    • If c is below 'a': Ignore
    • If c is above 'z': Ignore
    • Else: Add difference of 'A' and 'a' to c

This translates almost directly to the following assembly program:

upper:

    ; Read next character
    mov (%rdi), %al

    ; Test for zero byte
    test %al, %al
    je done

    ; Test for <'a' and >'z'
    cmp $'a', %al
    jl next
    cmp $'z', %al
    jg next

    ; We have a lower case character, so convert to upper case
    sub $0x20, %al ; Difference between 'A' and 'a'
    mov %al, (%rdi)

next:

    ; Increment pointer
    inc %rdi
    jmp upper

done:
    ret

This function expects the string pointer in rdi and thus can be directly called from C:

#include <stdio.h>

extern void upper(char *str);

int main()
{
    char str[] = "abc 123 <=>? 987 xyz!";
    upper(str);
    printf("Upper case: %s\n", str);

    return 0;
}

outputs

Upper case: ABC 123 <=>? 987 XYZ!
janw
  • 8,758
  • 11
  • 40
  • 62
  • 1
    Adding 0xe0 is an unusual way to write `-0x20`. Normally you'd `sub $0x20, %al` or just mask off the lower-case bit with `and $~0x20, %al`. Those are both more intuitive. Also, you can avoid using an unconditional `jmp` inside the loop by putting the load / test/jnz at the bottom. (And jumping to that for loop entry, or peeling that part of the first iteration). [Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926) – Peter Cordes Mar 01 '20 at 19:23
  • Thanks for pointing that out, I have fixed the addition. I just calculated the difference and did not notice that the number was negative :D – janw Mar 01 '20 at 22:30
  • Regarding the loop structure: I agree with you, but it is deliberate in this case, to stay as near to the pseudocode as possible. In practice a do...while approach is clearly the way to go here (or just writing the program in C and using a compiler...). – janw Mar 01 '20 at 22:33
  • 1
    Yeah, I upvoted anyway even though I consider do{}while loops to be the normal / idiomatic way to loop in asm. But there's enough conditional logic that adding a loop-entry branch could make it harder to follow for someone for whom this code wasn't already obvious. BTW, [What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?](https://stackoverflow.com/a/54585515) has some more info, including how to range-check with sub/cmp/ja instead of 2x cmp/jcc. – Peter Cordes Mar 01 '20 at 22:56