3

I recently started learning assembly language for the Intel x86-64 architecture using YASM. While solving one of the tasks suggested in a book (by Ray Seyfarth) I came to following problem:

When I place some characters into a buffer in the .bss section, I still see an empty string while debugging it in gdb. Placing characters into a buffer in the .data section shows up as expected in gdb.

segment .bss
result  resb    75
buf resw    100
usage   resq    1

    segment .data
str_test    db 0, 0, 0, 0

    segment .text
    global main
main:
    mov rbx, 'A'
    mov [buf], rbx          ; LINE - 1 STILL GET EMPTY STRING AFTER THAT INSTRUCTION
    mov [str_test], rbx     ; LINE - 2 PLACES CHARACTER NICELY. 
    ret

In gdb I get:

  • after LINE 1: x/s &buf, result - 0x7ffff7dd2740 <buf>: ""

  • after LINE 2: x/s &str_test, result - 0x601030: "A"

It looks like &buf isn't evaluating to the correct address, so it still sees all-zeros. 0x7ffff7dd2740 isn't in the BSS of the process being debugged, according to its /proc/PID/maps, so that makes no sense. Why does &buf evaluate to the wrong address, but &str_test evaluates to the right address? Neither are "global" symbols, but we did build with debug info.

Tested with GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10 on x86-64 Ubuntu 15.10.

I'm building with

yasm -felf64 -Worphan-labels -gdwarf2 buf-test.asm
gcc -g buf-test.o -o buf-test

nm on the executable shows the correct symbol addresses:

$ nm -n  buf-test     # numeric sort, heavily edited to omit symbols from glibc
...
0000000000601028 D __data_start
0000000000601038 d str_test
... 
000000000060103c B __bss_start
0000000000601040 b result
000000000060108b b buf
0000000000601153 b usage

(editor's note: I rewrote a lot of the question because the weirdness is in gdb's behaviour, not the OP's asm!).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Bulat M.
  • 680
  • 9
  • 25
  • I basically rewrote the question, since your asm *was* working. The problem was that `&buf` in gdb was some weird address on the stack. `0x7ffff...` addresses are stack addresses in x86-64 Linux. bss and data addresses are always in the low 2GB (so they fit in signed 32-bit integers, because many x86-64 instructions use sign-extended `imm32` immediates. e.g. `add rax, symbol` uses the `add r/m64, imm64` encoding). Anyway, I tested myself on my own x86-64 desktop, and reproduced the gdb weirdness, without find a way to get the correct address of `buf` in a gdb expression. – Peter Cordes Aug 30 '16 at 08:01
  • The other clue that you're getting the wrong address is that it's 16B-aligned (last hex digit is 0), but we know `buf` shouldn't be. It follows a `resb 75`, and the start of the bss is typically 16B-aligned (as we can see `result` is in the nm output. And yes, those symbol addresses are where the executable will actually be mapped every time you run it, because it's not relocatable. Code for Linux executables doesn't have to be position independent, unlike dynamic libraries need that (compile with `-fPIC`, or avoid absolute addresses in hand-written asm).) – Peter Cordes Aug 30 '16 at 08:05
  • Thanks, Peter, as I understand, this is a gdb behaviour, is it possible to cope with it and place characters somehow on the .bss section. Incorrect position of buf in stack instead of .bss annoys me, really, and I do not understand why it is like that. Should I align buf somehow? – Bulat M. Aug 30 '16 at 08:07
  • Your asm does exactly what you thought it would. The only problem is getting gdb to show you what's at address `0x060108b` by typing in something involving buf, instead of copy/pasting the numeric address. I ran `x /s 0x0060108b` and got `x60108b: "A"` – Peter Cordes Aug 30 '16 at 08:15
  • Note that you can see the numeric address in the disassembly. I have `set disassembly-flavor intel` and `layout reg` in my `~/.gdbinit`, so I don't have to use the `disas` command to see disassembly for instructions right around the current RIP. – Peter Cordes Aug 30 '16 at 08:17
  • And yes it would be normal to align `buf`, with an `ALIGN 16` directive before it. Or simply put `result` last. Or make its size a multiple of 16. This is all assuming you want it to be 16B-aligned, rather than 64B (cache line size) or any other size. It all depends on what you want to use it for. – Peter Cordes Aug 30 '16 at 08:19
  • I understand all, but one quirk is still strange for me. What does "weirdness is in gdb's behaviour" mean? It really strange for me that gdb shows .bss's buffer's address in the stack, instead of being in lowest 2G of memory addresses as you said. – Bulat M. Aug 30 '16 at 08:52
  • So far I've seen that `ptype str_test` shows `type = `, but `ptype buf` says `type = char *`. It's strange for me, too, and I'm not just learning asm; I'm already pretty good at it. :) What I meant was that the behaviour of your asm code is not weird, and the only thing that *is* weird is what gdb is doing. So the weirdness in the result comes from weirdness in gdb. I'm going to try changing the name to something else, in case it's picking up a symbol-type attribute from something else with the same name, maybe in the CRT start files or in glibc. – Peter Cordes Aug 30 '16 at 09:06

1 Answers1

3

glibc includes a symbol named buf, as well.

(gdb) info variables ^buf$
All variables matching regular expression "^buf$":

File strerror.c:
static char *buf;

Non-debugging symbols:
0x000000000060108b  buf            <-- this is our buf
0x00007ffff7dd6400  buf            <-- this is glibc's buf

gdb happens to choose the symbol from glibc over the symbol from the executable. This is why ptype buf shows char *.

Using a different name for the buffer avoids the problem, and so does a global buf to make it a global symbol. You also wouldn't have a problem if you wrote a stand-alone program that didn't link libc (i.e. define _start and make an exit system call instead of running a ret)


Note that 0x00007ffff7dd6400 (address of buf on my system; different from yours) is not actually a stack address. It visually looks like a stack address, but it's not: it has a different number of f digits after the 7. Sorry for that confusion in comments and an earlier edit of the question.

Shared libraries are also loaded near the top of the low 47 bits of virtual address space, near where the stack is mapped. They're position-independent, but a library's BSS space has to be in the right place relative to its code. Checking /proc/PID/maps again more carefully, gdb's &buf is in fact in the rwx block of anonymous memory (not mapped to any file) right next to the mapping for libc-2.21.so.

7ffff7a0f000-7ffff7bcf000 r-xp 00000000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7bcf000-7ffff7dcf000 ---p 001c0000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dcf000-7ffff7dd3000 r-xp 001c0000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd3000-7ffff7dd5000 rwxp 001c4000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd5000-7ffff7dd9000 rwxp 00000000 00:00 0        <--- &buf is in this mapping
...
7ffffffdd000-7ffffffff000 rwxp 00000000 00:00 0              [stack]     <---- more FFs before the first non-FF than in &buf.

A normal call instruction with a rel32 encoding can't reach a library function, but it doesn't need to because GNU/Linux shared libraries have to support symbol interposition, so calls to library functions actually jump to the PLT, where an indirect jmp (with a pointer from the GOT) goes to the final destination.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Now it is clear, that this was banal name clash with buf in libc mapped somewhere near the stack. So it worth choosing proper names for labels, variables. Thanks, Peter. – Bulat M. Aug 30 '16 at 11:19
  • @BulatM.: you probably wouldn't have had a problem if you didn't have debug symbols for glibc installed. It doesn't really pollute the global namespace. (And of course your `buf` wasn't a global symbol either). That's why there can be two symbols with different addresses without getting a link error. – Peter Cordes Aug 30 '16 at 11:23
  • And why gdb chose to display buf from libc and not from the asm program? And how to deliberately view contents of libc's buf and program's buf? I mean how to distinguish them in gdb commands. – Bulat M. Aug 30 '16 at 12:06
  • @BulatM.: Before the program starts running, `buf` is your program's symbol. But after it starts running, gdb loads symbols from glibc when it's dynamically loaded. My guess is that gdb uses the entry from whatever debug symbols it saw last. OTOH, if you were debugging C with a `static short buf[100]`, I think gdb would know based on symbol metadata and stuff. Especially if you compiled it with `-g`. But in assembly, you have to use directives manually to generate debug info. IDK how to disambiguate the symbol name. It might not be possible, or maybe there's some kind of filename prefix. – Peter Cordes Aug 30 '16 at 12:27
  • You told about using directives "you have to use directives manually to generate debug info." Could you suggest, please for YASM syntax? That might be useful in other programs too. Indeed I am generating debugging symbols via YASM option: yasm -f elf64 -g dwarf2 -m amd64 main.asm; clang -o main main.o – Bulat M. Aug 30 '16 at 12:44
  • IDK how in NASM/YASM syntax, but if you look at `gcc -S` output, you'll see `.type` directives. That is what I was thinking of when I wrote the previous comment, but the actual C type names are only in the debug info, though, which is a separate thing. Anyway, have a look at [this example on the Godbolt compiler explorer](https://godbolt.org/g/xRyxAz) where you can see the `.type x, @object` directives, and stuff like that. Hmm, I actually don't see a `.type buf` (for the array in bss). There's debug info for it, but I don't see a `.size` or `.type`. It's too inconvenient to use manually – Peter Cordes Aug 30 '16 at 12:58
  • @BulatM.: I mean, YASM does generate enough debug info for gdb to be usable, but you don't get type information because YASM doesn't know it. Manually writing directives to produce that kind of debug info yourself is not worth the effort. It looks like the best bet is just to keep an eye out for name collisions with glibc, if you link it. Or just make your static data `global`, with a `global buf` directive; then it should take priority over the `static char *buf` in glibc. Or just don't even use static data in the first place; allocate space on the stack. – Peter Cordes Aug 30 '16 at 13:03
  • 1
    For C objects, gdb can distinguish between `static` objects with the same name from different source files by doing `x/s 'foo.c'::buf`. I haven't found a way to do this for functions from asm source, though, not even with `-F dwarf` to have nasm include debug info. The obvious `x/s 'foo.asm'::buf` doesn't work. – Nate Eldredge Oct 05 '21 at 17:03