3

To get the length of a string, I am using the following function:

string:     .asciz "hello world!\n"
get_string_length:
    mov $0, %eax    # size goes in rax
  .L1_loop:
    movzbw string(,%eax,1), %ebx
    cmp $0, %ebx
    je .L1_exit
    inc %eax
    jmp .L1_loop
  .L1_exit:
    ret

However, I have also seen the following:

hello_world:
    .ascii "hello world\n"
    hello_world_len = . - hello_world

How does the following work? That is the . notation and all to get the length? For example, in this github snippet here: https://github.com/cirosantilli/linux-kernel-module-cheat/blob/9dccafe00d9b0affa8847836a71ebb4c37be7090/userland/arch/x86_64/freestanding/linux/hello.S

samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58
  • Note that `movzbw string(,%eax,1), %ebx` has an operand-size mismatch between the destination register being a dword (`l` size), but the instruction suffix (`w`). GAS suprisingly doesn't warn or error and just assembles it as `movzbl`. Also, it's pointlessly inefficient (code size) to force EAX as an index instead of a base. Also, EBX is normally call-preserved; ECX or EDX would be the normal choice for another temporary register. – Peter Cordes Sep 16 '20 at 23:59
  • @PeterCordes thanks for the feedback, so it should be `movzbl`, correct? And do you mean by doing `string(,%eax,1)` vs. `string(%eax)` ? – samuelbrody1249 Sep 17 '20 at 02:51
  • Yes, `movzbl string(%eax), %ecx`. You're just adding a byte offset whether it's the base or the index in the x86 addressing mode. – Peter Cordes Sep 17 '20 at 03:05

1 Answers1

6

The first version determines the length at run-time and the second version sets the length at assembly time.

The . in the second expression represents the current address (in the data segment). Then, the expression

hello_world_len = . - hello_world

subtracts the starting address of the string .ascii "hello world\n"indicated by the label hello_world: from the current address(indicated by the .) resulting in the length value hello_world_len.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • thank you very much that was exactly what I was looking to learn. Is `.` replaced when assembled with its current address? So that, for example, `.` = 10, and `hello_world` = 5. And so hello_world_len = 10 - 5 --> Substitute the word "hello_world_len" with the numeric value `5` in the elf file? – samuelbrody1249 Sep 16 '20 at 23:18
  • 2
    Pretty much, yes. But I would prefer the word "set" instead of "substitute", because it is rather an evaluation of an expression than a textual "substitution". But that's only a formal distinction, and your thinking is applicable. – zx485 Sep 16 '20 at 23:24
  • one last thing, does this add `+1` for the null-terminator if there is one, such as in using `.ascii` vs `asciz` ? So `hello` would have length `5` but `hello\0` would have length `6`? – samuelbrody1249 Sep 16 '20 at 23:28
  • In the second version, the null-terminator in `.asciz` is counted as well. So for `hello`, `.ascii` (5) or `.asciz` (6) doesn't matter. The result will be accurate. In the _runtime version_ you'd have to take care of that. – zx485 Sep 16 '20 at 23:31
  • 1
    @samuelbrody1249 related: https://stackoverflow.com/questions/8987767/is-there-a-symbol-that-represents-the-current-address-in-gnu-gas-assembly – Ciro Santilli OurBigBook.com Sep 17 '20 at 06:57