Why if I use immediately r0 the program doesn't work but if I LDR to r2 and then LDR to r0 it works?

Question

I have this program that just returns the value that I am passing via command line.

This works:

.global main

main:
        ldr     r2, [r1,#4]    // get the argv[1] and put it in r2
        ldr     r0, [r2]       // put it in r0 from r2
        sub     r0, r0, #48    // from ascii value to actual decimal value
        bx      lr

What I have not clear is, why doesn't it work if I use r0 instead of r2? Like this doesn't work:

.global main

main:
        ldr     r0, [r1,#4]        // put the value immediately to r0
        sub     r0, r0, #48        // ascii to actual value
        bx      lr

If I execute the program with 7 value:

./program 7
echo $?

in the first case I got the actual value (7) but in the second I got (3)...

Looks like your working example does an additional pointer dereference. If you did the same in the second example like `ldr r0, [r0]` it would probably work the same way. — ecm, Sep 09 '20 at 07:17
Should really close this as a duplicate of the other question, others can read the prior question, this one and the answers and decide if that is the case or not (or of course add their own answers or comments). — old_timer, Sep 09 '20 at 07:48

score 3 · Accepted Answer · edited Feb 04 '21 at 23:59

You are trying to do return(argv[1][0]-0x30) which is a bug converting a string in general but works for one character, but instead you are:

    ldr     r2, [r1,#4]    // address of argv[1]
    ldr     r0, [r2]       // read first four characters in argv[1]
                           // argv[1][0..3]
    sub     r0, r0, #48    // convert the first one to decimal
                           // leaving the other three unmodified
    bx      lr

This one is return( (*((unsigned int *)(&argv[1][0]))) - 0x30) which is a bug (as mentioned more than once in the prior question)(assuming I got all of my syntax right banging out this answer)(typecast the char pointer to the first character to to a word pointer of the first four characters and read that), but

    ldr     r2, [r1,#4]    // address of argv[1]
    sub     r0, r0, #48    // modify address to argv[1]
    bx      lr

is return( ((unsigned int)(argv[1])) - 0x30), an even bigger bug (convert the pointer to the string to a word and the subtract from that address)(assuming I banged out the right syntax here as well).

You are modifying the address not any of the string data in this second case.

You need to cover both levels of indirection not just one. And a string is an array of bytes not an array of words.

try

./program 77

Instead of 77 you will get 14087 or some number like that, with your supposedly working version.

All of this was covered in the prior question. Do you understand what a two dimensional array means? char argv[][]?

./program 77

argv itself points to an array of pointers

argv[0]
argv[1]
argv[2]

and then each of those points to a string

argv[0][0]='.'
argv[0][1]='/'
argv[0][2]='p'
argv[0][3]='r'
argv[0][4]='o'
...
argv[0][n]=0

argv[1][0]='7'
argv[1][1]='7'
argv[1][2]=0

r0 is argc
r1 is argv

so r1 contains the address to the ARRAY OF POINTERS

ldr r3,[r1,#0] //pointer to argv[0] string
ldr r4,[r1,#4] //pointer to argv[1] string
ldr r5,[r1,#8] //pointer to argv[2] string
...

You cannot skip that step you want to access the string you have to start at the beginning of the string.

Now once you have done the above THEN you can do this:

ldrb r0,[r4,#0] // argv[1][0] = '7'
ldrb r1,[r4,#1] // argv[1][1] = '7'
ldrb r2,[r4,#2] // argv[1][2] = 0

if you instead

ldr r0,[r4,#0]

that is all of argv[1][0] through argv[1][3] in one shot assuming you don't get an alignment fault because there is no reason why argv[1] has to point to a word aligned address.

so that would put 0xZZ003737 in r0, where ZZ is an unknown/non-deterministic byte that is outside the argv[1] string it could be argv[2][0] for example. You have experienced some dumb luck if you are doing

./program 7

and getting 0x00000037 by using the wrong instruction and wrong approach (for the nth time read and understand Frant's answer to the other question).

If you were to have this

char mystring[]="1234567";

would you use

mystring[0]-=0x30;

To convert that from a string (0x31,0x32,0x33,...0x37,0x00) to a value 1234567 (0x12d687)? Certainly not, that would not work at all. You would need to use atoi, atol, strtol, etc. (read Frant's answer) or roll your own.

rb=0;
for(ra=0;mystring[ra];ra++)
{
    rb*=10;
    rb+=mystring[ra]-=0x30;
}

assuming we know ahead of time the user is passing in a decimal number in the string. (bad assumption, yet another bug doing something like this)

doing this:

mystring[0]-=0x30;

only modifying one item does nothing to convert the string to a number.

to demonstrate all of this further, the operating system loader will fill in argv[][] for you in some memory you have access to.

So for example

./so 123

I am going to make up addresses for demonstration purposes

[address] data
[0x00001000] 0x00001008  pointer to argv[0]
[0x00001004] 0x0000100D  pointer to argv[1]
[0x00001008] 0x2E '.'
[0x00001009] 0x2F '/'
[0x0000100A] 0x73 's'
[0x0000100B] 0x6F 'o'
[0x0000100C] 0x00 string termination
[0x0000100D] 0x31 '1'
[0x0000100E] 0x32 '2'
[0x0000100F] 0x33 '3' 
[0x00001010] 0x00 string termination

So in this case r1 would be set to 0x00001000 before main is called.

So

ldr r2,[r1,#4] read 0x1004 r2 = 0x100D
ldrb r0,[r2]  read 0x100D r0 = 0x31
sub r0,r0,#0x30, r0 = 1 (note: which is not equal to 123)

If you

ldr r2,[r1,#4] read 0x1004 r2 = 0x100D
ldr r0,[r2]  read 0x100D r0 = 0x00333231
sub r0,r0,#0x30  r0 = 0x00333201 (note: which is not equal to 123 = 0x7B)

Plus that is an alignment fault if enabled.

If you

ldr r2,[r1,#4] read 0x1004 r2 = 0x100D
sub r0,r2,#0x30   r0 = 0xFDD

And that is clearly wrong, that has no value whatsoever. Hosing a pointer to a string using a bad string conversion solution.

Note:

ldr     r0, [r2]  // read word from address in r2 and put in r0

is not equal to

mov     r0, r2    // copy contents of r2 into r0

For at least the arm tools and gas assembly languages the [brackets] indicate a level of indirection so [r2] means the thing at the address contained in r2, where r2 means the contents of r2.

Two completely different instructions. You should have the arm documentation for the instruction set, the architectural reference manual for one of the architectures, start with armv5 if you don't know. Don't bother with ARM's Programmers' Reference Manuals; they create more questions than answers. The technical reference manual and architectural reference manual for the core in question is what you should always have BEFORE you start doing any work like this.

ARM does pretty good with their pseudo code, especially the older ARM ARM compared to the newer which has more features so more detail to cover.

Since some of us saw your prior/original question with the original content before modification and you are already calling C functions from main: then read Frant's answer with what you know now and just call another C function.

I think the OS or shell may limit the value in `$?` to a byte, masking with 255. That would explain how their first example in this question works: the more-significant bytes may be nonzero but are not accessed by `$?`. — ecm, Sep 09 '20 at 07:48
@ecm yeah couldnt quite figure out how that was being masked...that makes sense. Also if getting a three then it is dumb luck that the lower byte is 0x33. — old_timer, Sep 09 '20 at 07:50
@old_timer I don't know how can I thank you.... You REALLY helped me. Internet is awesome. Very interesting stuff! Also sorry for dumb questions, I am trying to learn :) — Mnkisd, Sep 09 '20 at 08:23
I get that but also you need to put some effort in, have and read the instruction set documentation any time you are trying to write assembly language or read it from a disassembly. Think about what a two dimensional array means, and how many levels of indirection you have to go to get to the items in question. — old_timer, Sep 09 '20 at 15:46
Good luck, have fun, IMO one should have more answers than questions at this site, so find some things you know about and pay it forward please. — old_timer, Sep 09 '20 at 15:48
@ecm: indeed, Unix and Linux only use the low byte of the arg passed to `_exit(int)`, using `WEXITSTATUS(status)` in the parent to extract that bitfield from an integer that includes other fields (like signal number if it died from an uncaught signal). [Return value range of the main function](https://stackoverflow.com/a/5149399) explains that it's actually possible to recover the full 32-bit exit status using `waitid` instead of `wait` / `waitpid`, but traditional shells like bash don't do that. POSIX specifies that an exit status is 8 bits. — Peter Cordes, Sep 12 '20 at 18:39

Why if I use immediately r0 the program doesn't work but if I LDR to r2 and then LDR to r0 it works?

1 Answers1