3

Context

I'm working through some examples in a book by Johnathan Bartlett titled "Learn to Program with Assembly" (2021). The author assumes a linux environment. I'm on OSX (Monterey). He's using gcc. I've got clang (v 13.1.6). In chapter 7 the author introduces laying out data records.

To facilitate this, he uses the .equ directive to define some constants in a file titled persondata.s which happens to only contain a data segment. For example:

# Describe the components of the struct
.globl WEIGHT_OFFSET, HAIR_OFFSET, HEIGHT_OFFSET, AGE_OFFSET .equ WEIGHT_OFFSET, 0
.equ HAIR_OFFSET, 8
.equ HEIGHT_OFFSET, 16
.equ AGE_OFFSET, 24

In another file, tallest.s, he makes use of the HEIGHT_OFFSET constant to access the height of a person record. This file has only a text segment.

movq HEIGHT_OFFSET(%rbx), %rax

The Problem

When I assemble tallest.s using the built-in tools on OSX, the assembler complains that I'm trying to use 32-bit absolute addressing in 64-bit mode.

The Question

How is this supposed to work on OSX? How am I supposed to make use of .equ defined constants?

Things I Tried

If I merge these two files into one file, then assembler doesn't complain. It treats HEIGHT_OFFSET as the constant that it is.

I presume the idea is to have constants defined along with the data, and then make use of those constants in code to avoid 'magic numbers'. Sounds like a good idea.

I tried assembling, linking, and running this code using the book's docker image (johnnyb61820/linux-assembly). It works. No complaints. Some details

# as -v
GNU assembler version 2.31.1 (x86_64-linux-gnu) using BFD version (GNU Binutils for Debian) 2.31.1
^C
# ld -v
GNU ld (GNU Binutils for Debian) 2.31.1
# uname -a
Linux eded2adb9c06 5.10.124-linuxkit #1 SMP Thu Jun 30 08:19:10 UTC 2022 x86_64 GNU/Linux

So it works as written under that set-up. Just not under my set-up which is clang (v 13.1.6).

Based on the fact that this works in the linuxkit docker image, I thought to install gcc via homebrew on my machine. This got me version 12.2.0 of gcc, which I used to try and compile/link my files. It also thinks HEIGHT_OFFSET is a problem due to 32-bit absolute addressing in 64-bit mode.

Based on the output of name -a in the docker image, I'm guessing it is 64 bit. Linux eded2adb9c06 5.10.124-linuxkit #1 SMP Thu Jun 30 08:19:10 UTC 2022 x86_64 GNU/Linux

Oddly enough, it doesn't complain about 32-bit absolute addressing not being supported. Under OSX, I had to make everything rip-relative to access any static-data (true for both gcc and clang). Makes me wonder what it is doing with these addresses.

As a possibly final note, under OSX yasm also doesn't like me using .equ defined constants from another file. It complains about wanting to make use of "32 bit absolute relocations" in 64 bit mode. GCC (12.2.0) and llvm-mc (13.0.1) also take issue with the HEIGHT_OFFSET constant.

Chris
  • 342
  • 1
  • 12
  • Does `.globl` make any sense on constants? I don't see it. I think that is doing more harm than good. – Erik Eidt Sep 01 '22 at 20:29
  • 2
    You can use the C preprocessor's #include feature: store the constants in a separate header file and include it wherever needed. Otherwise, some assemblers support an include notion. – Erik Eidt Sep 01 '22 at 20:29
  • 1
    Just a general suggestion: I don't know about clang, but gcc supports passing asm code through the C preprocessor (name the asm file `.S`) so you can use C-style `#define` constants (which you can then share with C code via `#include`). – sj95126 Sep 01 '22 at 20:44
  • 1
    @ErikEidt: or GAS `.include` can include an asm source file. But yeah, for include-path search reasons you might prefer the C preprocessor. – Peter Cordes Sep 01 '22 at 20:52
  • 1
    It could possibly work with `.set HAIR_OFFSET, 8`, but the other files using it will at best be using `[reg + disp32]` addressing modes instead of `[reg + disp8]` (1-byte displacement), since they can't see the constant value at assemble time. They have to leave it for the linker to fill in, like any other undefined symbol that GAS assumes is external. If the tools still complain about absolute addressing, they're not designed for this sub-optimal way of using them. – Peter Cordes Sep 01 '22 at 20:55
  • 1
    Oh, you're on MacOS. I think MachO64 probably doesn't have a relocation type for 32-bit absolute, but ELF does. So you *could* abuse ELF to make this happen the way you're trying (via the symbol table), instead of the normal efficient way using `.include` to make the constant visible at assemble time instead of only at link time. – Peter Cordes Sep 02 '22 at 00:37
  • @PeterCordes If I use .include is that literally the same putting everything in one file and not separating the data and text? – Chris Sep 02 '22 at 01:06
  • Right, you'd want to move the `.equ` constants to a `.h` or whatever file extension you want to use for assembler include files. Exactly the same situation as in C, where you want to make a `#define` visible across files, or a `static const int` – Peter Cordes Sep 02 '22 at 01:30
  • I don't know what you're hoping to accomplish with `movq HAIR_OFFSET(%rip), %rbx`. Are you *trying* to load 8 bytes of machine code from 16 bytes after the end of this instruction? Because that's what you'd expect if a `.equ` was visible. But since it isn't, it's actually trying to generate a load from that symbol address, addressed wrt. RIP. i.e. the `rel32` has to be `-RIP + 16` to reach absolute address `16`, the symbol address you defined in another `.o`. See [this Q&A](https://stackoverflow.com/questions/54745872/how-do-rip-relative-variable-references-like-rip-a), also – Peter Cordes Sep 02 '22 at 17:11
  • Also [Distinguishing memory from constant in GNU as .intel\_syntax](https://stackoverflow.com/q/39355188) is highly relevant for how GAS is interpreting `HAIR_OFFSET`. Anyway, a `mov` from memory is never going to put `16` (the symbol "address") into a register. Also related: [How to load address of function or label into register](https://stackoverflow.com/q/57212012) . – Peter Cordes Sep 02 '22 at 17:13
  • 1
    What is your actual goal here? To learn about relocations and how GAS interprets symbols? Or to make asm that can efficiently use the same assemble-time constant in multiple files? The answer to the 2nd question doesn't involve the symbol table at all, it involves some kind of `.include` or `#include` to let all files see the constants at *assemble* time, not link time. – Peter Cordes Sep 02 '22 at 17:14
  • Re: your last edit: yeah, this is a matter of [How do RIP-relative variable references like "\[RIP + \_a\]" in x86-64 GAS Intel-syntax work?](https://stackoverflow.com/q/54745872) - `mov HAIR_OFFSET(%rip), %rbx` means load from that symbol address, using a RIP-relative addressing mode to reach its address. That's why `mov some_global(%rip), %eax` does what you want (loading from the bytes at label `some_global: .int 123`), instead of needing `some_global - (. + 6)(%rip)` to actually do the RIP-relative calc yourself; `.` is the address of the start of the instruction, `.+6` is its end. – Peter Cordes Sep 02 '22 at 17:49
  • 1
    Probably you should ask that as a new question, because posting those last few comments as an answer wouldn't answer the title question about how to use `.equ` across files. – Peter Cordes Sep 02 '22 at 17:51
  • 1
    Or IDK, I guess the answer to the right way to do this is pretty short, just `.include` or `#include` and put constant definitions in a separate file. – Peter Cordes Sep 02 '22 at 18:18
  • @PeterCordes the `movq HAIR_OFFSET(%rip), %rbx` was just an experiment to see what would happen with the HAIR_OFFSET symbol. What I really wanted to do was use `movq HAIR_OFFSET(%rbx), %rax` to access the HAIR field of the person record. Works on linux with gcc as and ld. It's supposed to add 16 bytes to the %rbx pointer and dereference it. The connection is that HAIR_OFFSET is a `.equ` constant – Chris Sep 03 '22 at 01:57

0 Answers0