Why does not XORing %eax causes segfault?

Question

.text

having this:

str:
    .string "string"
    .globl main
main:
    xor %eax, %eax #is commented causes segfault
    leaq str(%rip), %rdi
    call printf
    xorq %rdi, %rdi
    call exit

Does printf uses %rax? or is the segfault caused by str(%rip)? As how i understand in -> leaq str(%rip) uses address at register %rip+str. But str is address, not value (like +4,8 or 16...), then what does the leaq str(%rip) gain?

Compiled as $cc foo.s.

As per the sysv abi convention `%al` (which is the low byte of `%rax`) should contain the number of SSE registers used to pass arguments. It is zero in your case. This, coupled with the fact that you misalign the stack causes the fault. The `leaq str(%rip)` just loads the address of `str` in a position independent manner. — Jester, Jun 04 '20 at 13:44
@Jester do not understand. 1) `%al` is important for what function? What is SSE register and what are they used for? 2) How do I misalign? 3) how does `leaq str(%rip)` loads the address of str, when the offset `str` before `%rip` is not a number (str is either address of it, or ascii value it represents)? What will be loaded to `%edi`, when `%rip` goes upward by the ofset `str`? Need full answers to all questions — autistic456, Jun 04 '20 at 13:54
SSE registers are the `%xmm`. They are used to pass floating point numbers. `%al` is important for varargs (`...`) functions, and `printf` is such. The `call` pushes 8 bytes onto the stack and you need to balance that with another 8 bytes to make a multiple of 16 as per convention. `str(%rip)` is just syntax for the address of `str` in a position independent manner. The assembler/linker will emit the correct offset instead of the actual address and the cpu will calculate it at runtime. — Jester, Jun 04 '20 at 14:01
but the syntax of offset(%register) should be -> offset:number from current value at %register (indirect addressing), why is this different? — autistic456, Jun 04 '20 at 14:06
@Jester also, the %rip register reads "upwards"? (say it reads a opcode from address 0x0 to 0x7)? because then the offset, would be %rip+(str). and if instructions are read "upwards", then it is opposite of stack, which grows "downward". That is little bit misleading. — autistic456, Jun 04 '20 at 14:11
For convenience. The assembler does turn it into `offset(%rip)` but it calculates the proper offset for you such that `%rip+offset = address`. Instead of having to type stuff like `str-.-7(%rip)` you get to type `str(%rip)`. The direction of stack growth has nothing to do with this. Maybe you are confusing `%rip` with `%rsp`? — Jester, Jun 04 '20 at 14:15
[Segmentation fault on printf - NASM 64bit Linux](https://stackoverflow.com/q/25693827) answers the question of why: because on non-zero AL, variadic functions dump the XMM regs to the stack with aligned stores. Disassembly in [Assembly - Passing parameters to a function call](https://stackoverflow.com/q/38727551) — Peter Cordes, Jun 04 '20 at 14:24
@Jester, can you please explain the `str-.-7(%rip)` addressing mode? from this `https://stackoverflow.com/questions/54822792/how-to-get-the-size-of-a-function-in-bytes-in-gnu-assembler-with-intel-syntax`, There is said, the `-` minus operator between two address operands calcuate the offset. So if I would apply this to your displacement instruction, I would transalte it as `(addressOf(str)-currentPosition-7)`, that does not make sanse, since the first operator `addressOf(str)`, is less then `currentPosition`, so the offset would be negative (does not make sense). Could you please explain? — autistic456, Jun 04 '20 at 16:38
Negative offset makes perfect sense since `str` is before the `lea`. Thus you need to subtract from `rip`. — Jester, Jun 04 '20 at 17:56
@Jester and why is there `-7` at the end? It is not even full byte — autistic456, Jun 04 '20 at 19:33
Addresses use bytes not bits, so it is a full byte. Seven of them :) It's because `%rip` points past the instruction and that `lea` happens to take up 7 bytes. — Jester, Jun 04 '20 at 21:11
@Jester I did not meant bits, but wonder why is offset 7 *bytes* in 8 bytes-boundary alignment. But as you said - it is because of the `lea(q)` opcode length. But anyway, how do you know the instruction `lea` takes 7 bytes? I haven't found it in intel docs, and this information (opcode length of various instructions) are needed in order to work with `%rip` register. So if I know the length of all instructions from the `.text` section, then I can calculate the size of executeable part of the program? And how to work with `%rip` register then? — autistic456, Jun 04 '20 at 22:35
It is in the intel docs, but I took a shortcut and just assembled it :) In the instruction set reference you can see `REX.W + 8D /r` which means 2 bytes, then a modrm byte encoding the RIP-relative addressing mode, then 4 bytes for the offset. `%rip` always points past the instruction in which it is referenced. In general you do not care. You just write `foo(%rip)` and let the tools figure it out for you. — Jester, Jun 04 '20 at 22:43
@Jester figure it out? That smart is as? As has some shurtcuts or aliases, or how does it know, what I want? It is more clever then is gcc, which has a numerous standard rule and how-to write correctly? What do you mean by that? Where is it documented this behaviour of "figuring out"? — autistic456, Jun 04 '20 at 22:46
It knows what you want because that is the syntax. If you write `foo(%rip)` you say you want a rip-relative address for `foo`. It is [documented in the manual](https://sourceware.org/binutils/docs/as/i386_002dMemory.html). — Jester, Jun 04 '20 at 22:50
@Jester anyway, why cannot I - in order to load the address of string on `%rdi` - use `mov str, %rdi`? When the symbol `str` (which has the ascii value) itself is address? Why I need to take relative rip addressing? I thought every symbol without `$` immidiat is address, so why does it not work? — autistic456, Jun 04 '20 at 23:47
It works **with** the `$`. Note that this is an absolute address and depending on environment and linker settings it is not allowed. Also it is a 32 bit constant. If you want 64 bit you need to use `movabsq`. — Jester, Jun 04 '20 at 23:54
@Jester I do not get it. Why it works with immediate, when I want to pass *address* not *ascii value* it stores. And since `$` gives immediate, it is indeed value, not address, then i does not make sense. Also using `lea str, %rdi` works in my case, so I assume `mov` is passing *value*, but `lea` is really passing *address*, that makes sense, but **with** `$` it does not — autistic456, Jun 05 '20 at 00:00
@Jester also, what does mean `movabs`? I only know mov(postfix), where postfix should gives hint about the size of operands, (rax/eax/ax/ah,al), but what does absolute `abs` postfix mean? it does not seems as size spcifier, so how does as knows what operand-size it has? — autistic456, Jun 05 '20 at 00:02
`$` means immediate. Without `$` you get a memory reference ("the value"). The instruction contains the address either way, just a different opcode. `movabsq` is a gnu invention to mean "mov 64 bit immediate". The `q` suffix applies to the operation size which is 64 bit for e.g. `movq $1, %rax` as well but that does not specify the size of the immediate. They had to come up with a way to select which size you want and they didn't like `movl $1, %rax` because that has a mismatch between `l` and `%rax`. — Jester, Jun 05 '20 at 00:10
*$ means immediate. Without $ you get a memory reference ("the value").*, so when it is used as immediate `mov $1, %rax` it is value. But when it is used in direct mode with symbol like `mov $str, %rax` it is address? What? It is right the opposite in different modes? (according what you said). And then why is there need to have something like `movl $1, %rax` (I suppose to null the most significent 32 bits - left part?), when the same is `movq $1, %rax`, when I am passing `$1`, then the left 32 bits should also be nulled, so why to mix it? — autistic456, Jun 05 '20 at 00:26
@Jester [continuing], in other words, why to specify `int 1`, and `long long 1`, when both means the same and in both cases the left 32bit part is nulled? — autistic456, Jun 05 '20 at 00:30
"value" meaning "value in memory". `$` means immediate, or "value in instruction" if you like. So `movq 1, %rax` will try to load from memory address 1. `movq $1, %rax` will just use `1` as a constant. Similarly, `movq foo, %rax` tries to load from memory at address `foo` and `movq $foo, %rax` just uses the address of foo as a constant. There is no `movl $1, %rax`. I said that could have been an alternative solution to a problem to have `movl $1, %rax` for 32 bit and `movq $1, %rax` for 64. Instead we have `movq $1, %rax` for 32 bit and `movabsq $1, %rax` for 64. — Jester, Jun 05 '20 at 00:31
@Jester, ok that is what I said. But then why you said to my `mov str, %rdi`, that *It would works with the $*, when for printf in needs address, not value, as we now agree `$` is value. And for the movasbq, I still do not see difference between `movq $1, %rax` and `movabs $1, %rax`. Yes the first one will null the left 32bits, but the value is still `1`, so it does not make difference anyway, again I would need an example to see its usage in real world — autistic456, Jun 05 '20 at 00:39
No `$` is **not value from memory**. It's "value stored in instruction" which for a symbol is its address. For the second question, yes of course it makes no difference for `1` as that fits into 32 bits. But you don't know if `foo` fits or not, or if you want to stuff that `1` into a 64 bit immediate anyway because you later want to patch or relocate it. Incidentally, gas knows you want `movabsq` if the constant does not fit into 32 bits, so `movq $0x12345678abcdef0, %rax` assembles as if you used `movabsq`. — Jester, Jun 05 '20 at 00:41
@Jester thanks for clarifying it finally. But that brings me to hasitate..., if a immediate is value stored in instruction, then the value is different among symbols in program. So that brings to question - which kind of symbols program could have? If there is immediate of type address, or "value", what else could symbol represents, and how can I find out? For now it seems the `$` could store a value not known for me, since it depends on that particular instruction it is used in, so what types/kinds of symbol are in assembler? (we would say "types" in c) — autistic456, Jun 05 '20 at 00:47
There are no types. The instruction you use specifies the type. You can use `movl` to load a `float` if you like. Or load the first 4 characters of your string. Some assemblers attempt to do some type checking as a friendly service (e.g. masm). — Jester, Jun 05 '20 at 00:58
@Jester no, I dont meant types like you mean (type checking), but rather what particula instructions takes as its operands. You said *which for a symbol is its address*, where I emphasize the *for symbol*. It sounds, like it is not about the operands (like you said, I can load a `float` if I want), but rather about the instruction. So as far as I know, the only "types" or "pseudo-types" are number,floats and address? Or is anything *else* that could be in instruction/opcode as immediate (or other way)? — autistic456, Jun 05 '20 at 01:12
@Jester I undestand the `$` immediate value for symbol is it address, but then, why symbol *without* the immediate (`mov str, %rax`), is *value at that address*? When does the *dereferencing* comes in? — autistic456, Jun 05 '20 at 10:32
The cpu does the dereferencing at runtime. The "mov immediate" and "mov from memory" are really two different instructions with the same name. They have different opcode. — Jester, Jun 05 '20 at 10:45
@Jester yes, but then it is misleading it that, `$` means both value and address depending on the instruction. `mov $symbol, %register` here `$` means address. `mov $0x1, %register`, here it means value. Now `mov 0xfffffff, %register` here is address, but **without** `$`. So therefor it is a little bit misleading, becuase depending on the instruction used — autistic456, Jun 05 '20 at 10:48
`$` selects the "mov immediate" instruction in both the `mov $symbol, %register` and the `mov $1, %register`. The symbol itself is the address, it always is. Even in `mov symbol, %register` the instruction contains the address but without the `$` the "mov from memory" opcode is used. In C you could say one is `register = &symbol;` and the other is `register = load(&symbol);`. Both cases have the address, but the second does a load from memory. — Jester, Jun 05 '20 at 11:06
@Jester explain me please, where in `mov $1, %register` is address? there is no symbol. And for the second - So could I assmue, the default opcode is `load`? (without `$`)? For every use in assemly, when I specifie plain symbol, then the default is to load it? Just to clarify it — autistic456, Jun 05 '20 at 11:10
@Jester and what if I do `lea $symbol, %reg`? (What is difference between immediate (address), and explicit instruction to find address `lea`?) — autistic456, Jun 05 '20 at 11:15
`lea` takes a full effective address not just an immediate so you can do stuff like `lea foo(%rax, %rbx), %rcx` which you can not do with a `mov` immediate. `1` can be an address but you usually specify addresses with symbols. E.g. `movq 1, %rax` will try to read from memory at address `1` (and likely crash but that's another matter). In the "mov immediate" the cpu just copies the bits. `mov $0x3F800000, %eax` would load that integer or if you look at it as float it's `1.0`. If you later used it to access memory it would be an address. The cpu doesn't care. — Jester, Jun 05 '20 at 11:30
@Jester `foo(%rax, %rbx)` according to this `[base + idx*scale * displacement]`, rax is base, and rbx is index, but there is no scale. So it is equivalent to foo(%rax)? — autistic456, Jun 05 '20 at 11:54
[Per the manual](https://sourceware.org/binutils/docs/as/i386_002dMemory.html): _"If no scale is specified, scale is taken to be 1."_. So it's equivalent to `foo(%rax, %rbx, 1)` or `[rax + rbx * 1 + foo]` in intel syntax. — Jester, Jun 05 '20 at 12:34

Why does not XORing %eax causes segfault?

0 Answers0

Linked