Why the = operator in ARM Assembly is used both for address of label and for immediate "literal value"?

Question

In ARM Assembly the = operator can be used to load the address of a label, like this:

LDR R0,=var

and the value of the label can then be loaded like this:

LDR R0,[R0]

This is all easy to understand.

In ARM Assembly, the = operator can also be used to load a "literal value", like this:

LDR R1,=0x40021010

This is in itself also easy to understand.

But, what I do not have a good idea about is the exact rational and history for why the same operator is used in both cases, since they are not the same case. Why is it used for both?

Presumably both cases are meta instructions that can assemble to more than one instruction, or use a literal pool nearby or such, if needed. — ecm, Aug 13 '23 at 10:15
The value of a label in assembly is the address it was placed at, not the content of memory at that address. So the two use cases are indeed identical. — fuz, Aug 13 '23 at 10:33
@fuz Hm, good point. I'll have to let it digest a bit but maybe that's it. — BipedalJoe, Aug 13 '23 at 10:39
Yeah, I'm just not sure why you see these as different. An address is just a number, after all. The value of that number is worked out at build time by the assembler / linker instead of being written explicitly by the programmer, that's all. Consider for instance that `.word 0xdeadbeef` and `.word var` are both equally valid. — Nate Eldredge, Aug 13 '23 at 13:02
@NateEldredge It is often easy to see why someone might misunderstand something, but it requires that you choose to take that perspective. In hind sight I don’t think my question was that weird, and @ fuz answered it well and on point. — BipedalJoe, Aug 13 '23 at 14:09
No, I didn't mean to imply that your question was weird - sorry if my comment came out as rude. It just seemed like it was based on some misconception that I couldn't identify. From your previous comment it wasn't clear whether fuz had resolved it for you, so I thought if you were still unsure, perhaps you could explain more about in what way you thought they were different, and that might help someone give a helpful explanation. Anyway, glad you have got it worked out! — Nate Eldredge, Aug 13 '23 at 15:32
@NateEldredge It's a good question and I have to think a bit about why I assumed what I did. As for @ fuz answer, I think I accepted it right away because it made sense and added up, but, sometimes when you wondered about something for some time, it has to "settle" a bit. At least for me. That's what I meant with having to digest it. — BipedalJoe, Aug 13 '23 at 15:46
I think, for example, that I originally interpreted =var as analogous to &var in C. Based on that, it would not make sense to do &0x40021010 and it returns the value 0x40021010, it should instead return the pointer to it. I probably just made the wrong initial assumption, then never had time to really make the right one. Can't say for sure that is why, but I know I thought of =var as &var. — BipedalJoe, Aug 13 '23 at 15:50
Think of labels / symbols in assembly as being like C `extern char symbol[]` - using the bare name gives you the address. (In some assemblers for other ISAs, there are contexts where that's not true, notably MASM for x86. But other x86 assemblers like NASM are more consistent, and `mov edi, symbol` uses the address as an immediate. *[Why in NASM do we have to use square brackets (\[ \]) to MOV to memory location?](https://stackoverflow.com/q/49534661)*) — Peter Cordes, Aug 13 '23 at 16:08
@PeterCordes I already understand the answer to the question since @ fuz explained it a few comments above, as I said then. Then @ Nate Eldredge asked why I did not understand it before that. I then replied one reason that could explain why. And also clarified that I had understood the explanation @ fuz gave. — BipedalJoe, Aug 13 '23 at 16:18
@BipedalJoe: Yeah, in some sense, it's C that is "weird" here. You can write `f(var)` and `f(0x40021010)` interchangeably in C expressions, but they result in fundamentally different code: using `var` causes a load from memory, while using `0x40021010` typically does not. — Nate Eldredge, Aug 13 '23 at 16:19
@PeterCordes But thank you anyway. You helped me out a lot with my old question about another topic. Peace. — BipedalJoe, Aug 13 '23 at 16:19
@NateEldredge would not doubt that. I prefer assembly or as low level as possible that I can work on. But I think my misunderstanding may be because I was thinking with some analogies from some higher level language, and I know I assumed = was like & and it must have stuck with me... also good point with .word 0xdeadbeef and .word var in your first comment, I tested it and assembled it and it works like you said. — BipedalJoe, Aug 13 '23 at 16:22
@BipedalJoe: As you can see, many of us like these questions as an opportunity to share the way that we think about something, or to mention related facts that we think are interesting. Please don't be offended - it's not meant to imply that you are stupid and need it explained over and over again. If you're happy with an explanation you've already gotten, that's great; these just give you some other possible ways you could think about the issue, and you can take them or leave them. — Nate Eldredge, Aug 13 '23 at 16:26
@NateEldredge I'm not offended. I don't see a need to have some advanced way to think about the topic since, as @ fuz explained, it is actually quite simple and doesn't need any elaborate way to think about it. I'd just started out from the wrong assumption, applied the wrong pattern to it, and once you've done that the brain can have a hard time seeing through it. — BipedalJoe, Aug 13 '23 at 16:30
@NateEldredge My first reply to you was also not meant to be offensive. I just emphasized that in pedagogy (a Q&A site is 50% technical expertise and 50% "pedagogical" expertise) you often _can_ see why someone did not understand something, but it can take some effort. And since I had to put some effort into it myself to even understand why, you were pretty right in asking what you did. Overall, your question was good, @ fuz answer was great, and @ PeterCordes helped me out a lot on another question half a year ago or so that was very helpful. — BipedalJoe, Aug 13 '23 at 16:32
(I could have reflected right away on why I did not get the answer directly and had to ask the question, but I was out for a walk and on my mobile phone, typing there is tedious...) peace! — BipedalJoe, Aug 13 '23 at 16:33
@BipedalJoe: Sounds good. I did actually spend a few minutes before posting trying to figure out what you might be thinking of - I'm a college professor so I have a lot of practice in that :) Anyway, glad it all worked out, and have a good day! — Nate Eldredge, Aug 13 '23 at 17:38
@NateEldredge You have my upvote for that :) Always good in pedagogy fields. And I did the reverse, spent a few minutes after replying to figure out what my wrong assumption had been. A good day to you too! — BipedalJoe, Aug 13 '23 at 17:45
The = in arm assembly is not really an operator in any classic sense, but rather syntax (i.e. a token) informing the assembler of a particular intent. — Erik Eidt, Aug 13 '23 at 23:49
@ErikEidt Hi yes I can agree operator may not be right word, as it is not for mathematical or logical operation. But operator can also be used more broadly, as with ”indirection operator”, *. If I reasked question maybe I’d use a differnt term. Peace. — BipedalJoe, Aug 14 '23 at 09:46
About the two people who closed the question, while I am not affected by it because @fuz kindly answered it, and did so very easily and clearly and quickly, I would not agree that the "already has an answer" is true, pedagogically. As already explained in this comment thread to @ NateEldredge, my assumption had been that = was similar to &. Not that a label did not have an address (thus was like a pointer). My misunderstanding was about why, if in the first case =label was a pointer to the value, it was not a pointer with immediate values. — BipedalJoe, Aug 14 '23 at 21:34
I was assuming, pattern-wise, that label values and immediate values would be managed in a similar way. I had no problem accepting that "the value of a label in assembly is the address it was placed at", as @ fuz put it. I am mostly indifferent to question being closed, since what I was looking for is an answer to my question, that I've gotten too (and I'm thankful for that. ) — BipedalJoe, Aug 14 '23 at 21:34
But, since a Q&A site is about connecting questions with answers, I'd disagree that the question can be framed as being the same (as in, what would someone with the same question actually wonder about) as what whoever closed this said it was. I'm not saying my question is a good question, or not retarded (if the two people who closed it think it is), just that it is not the same question as the claimed duplicate. — BipedalJoe, Aug 14 '23 at 21:34
@BipedalJoe Is it related in any way at all? This closed question will be linked to the 'duplicate'. You can argue that it is not 100% conceptually the same, but the 'equal' pseudo-op is basically to get constant values in ARM traditional assembler. It is wrong to think of it a an 'address' operation. In fact, this is related; what is a label in assembler? A label is the assembler way of making an address. There are actually 10-20 similar questions under the ARM tag. 'Research' should have found [ARM pseudo-ops](https://sourceware.org/binutils/docs/as/ARM-Opcodes.html). — artless noise, Aug 15 '23 at 13:28
@artlessnoise It depends on perspective. My wrong assumption, was actually about that I assumed =0x40021010 would behave like the linked post says the label behaves, and not the other way around. So it would probably be better to link it to some explanation of the =0x40021010 syntax instead =label. It's not a big deal, it's just about what level of beginners questions should be open to, somewhere out there someone may have the same wrong assumption I did. — BipedalJoe, Aug 15 '23 at 15:01
(And if closed questions are not deleted, it is even less of a deal. I may have misunderstood them as being deleted. ) Much thanks to everyone who contributed to me being able to understand it. Peace! — BipedalJoe, Aug 15 '23 at 15:03

Peter Cordes · Answer 1 · 2023-08-13T16:26:44.717

3

The two cases are the same. Think of labels / symbols in assembly as C extern char symbol[] - using the bare name gives you the address.

As Nate pointed out in comments, this is true in other contexts as well:

.word 0xdeadbeef       @ constant 4-byte value
.word var              @ address as 4-byte value

The = in LDR R1,=0x40021010 is there to tell the assembler it's a pseudo-instruction that should materialize that value in a register, instead of an addressing-mode. ARM doesn't have a [12-bit absolute] addressing mode AFAIK, but it does have a PC-relative addressing mode which could conceivably make sense for symbol names.

So there is ambiguity, and it's just easier to parse and for humans to read if there's a special character that indicates it's not just an ldr machine instruction. (It might be an ldr and assembling extra data into a literal pool, or it might be movw/movk depending on the assembler and target options.)

In some assemblers for other ISAs, there are contexts where that's not true, notably MASM for x86. But other x86 assemblers like NASM are more consistent, and mov edi, symbol uses the address as an immediate, vs. mov eax, [symbol] is required to load from it. Why in NASM do we have to use square brackets ([ ]) to MOV to memory location?

In AT&T syntax for x86, symbols and literal numbers with no decoration are treated the same, but as memory operands. mov 123, %eax is a load from absolute address 123, same as mov foo, %eax is a load from the address of the symbol foo, using a [disp32] addressing mode. (To mov-immediate a literal number or a symbol address, mov $123, %eax or mov $foo, %eax.) In other contexts, like .long foo, symbol names are addresses, because there's nothing else you could do with them that the assembler needs to disambiguate.

edited Aug 13 '23 at 16:26

answered Aug 13 '23 at 16:18

Peter Cordes

328,167
45
605
847

Indeed they are the same. It if very difficult to encode a 32 bit constant in a 32 bit instruction and make an ISA meaningful. – artless noise Aug 14 '23 at 21:00
@artlessnoise: That seems like an argument against needing `=` at all, i.e. that `ldr r0, symbol` wouldn't make sense, so should be treated as `ldr r0, =symbol`. But the same syntax with smaller integers like `ldr r0, 16` could plausibly be a load from absolute address `16` (except ARM doesn't have an addressing mode with just an absolute displacement, no register, so it would only be encodeable in machine code within +-12 bits of that address.) Or you could say `ldr r1, 0x40021010` could be treated as a pseudo-instruction that loads from that address (probably after constructing it in R1) – Peter Cordes Aug 14 '23 at 21:07
1

But yes, the fact that only a subset of the 32-bit integers can fit in a fixed-length machine instruction is highly relevant to how `ldr reg, =const` can assemble, that it often has to be 2 instruction or a load from a literal pool. (ARM and Thum pick interesting subsets, not just 0 to 2^n-1 like many RISCs, as you said in your next comment.) Unlike in x86 where instructions can be longer for full-size immediates. Still, I didn't get into it since syntax like `ldr r1, symbol` can still be meaningful as a PC-relative load from a nearby symbol, so the ambiguity exists in actual ARM. – Peter Cordes Aug 14 '23 at 21:10
There is a `mov Rx, #constant`, but it is limited to 8-bits shifted. There is also a 'short address' mode (that I have never seen used) where you can do something like `ldr r0, 16`. I was not commenting for syntax, just that it is impossible to fit in one instruction. – artless noise Aug 14 '23 at 21:10
[ARM tag with ldr and equals](https://stackoverflow.com/search?q=%5Barm%5D+ldr+%26equals%3B); there are actually more than this list... Similar [pseudo-ops Q/A](https://stackoverflow.com/questions/40778734/difference-between-pseudo-op-and-machine-op). – artless noise Aug 15 '23 at 13:56

Why the = operator in ARM Assembly is used both for address of label and for immediate "literal value"?

1 Answers1