Gnu assembler .data section value corrupted after syscall

Question

I have following code

.data
result: .byte 1
.lcomm input 1
.lcomm cha 2

.text
(some other code, syscalls)

At first everything is fine. When a syscall (eg. read) is called, the value at label 'result' changed to some random trash value.

Anyone know what's wrong?

P.S. Environment Debian x86_64 latest Running in virtualbox Using as -g ld emacs make latest

-----edit-----

(continue)
.global _start
_start:
mov $3,%rax
mov $0,%rbx
mov $input,%rcx
mov $1,%rdx
int $0x80
(sys_exit)

The value of 'input' was changed properly, but the value of 'result' changed to random value as well after

int $0x80

Can you make a [mcve]? You haven't shown your actual code, making it very hard to actually see what the problem is. — fuz, May 29 '18 at 11:45
Post a [mcve]. `sys_read` will only modify bytes from `buf` to `buf+length`. Or better, set a breakpoint on the memory being modified to find out where. And BTW, why are you showing your BSS static data (`.lcomm`)? Is it being modified too? — Peter Cordes, May 29 '18 at 11:46
@Peter Cordes Example added. It's my first time using this site so, a lot to learn :) — , May 29 '18 at 11:57
`int $0x80` is not the right way to do 64 bit system calls. See [this question](https://stackoverflow.com/q/46087730/417501) for details. This could be the source of your problems but it's hard to say without a [mcve] which your code snippet is not. Namely, your code snippet lacks the “complete” and “verifiable” parts that are important for others to reproduce and diagnose your problem. — fuz, May 29 '18 at 11:57
How did you check the value at `result` before/after `int $0x80`? Did you use a debugger to check while single-stepping? This still doesn't look like a MCVE, because I know that `sys_read` won't modify other bytes in user-space memory. (And BTW, [What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?](https://stackoverflow.com/q/46087730): it's not recommended, and would fail if you made a PIE executable. Using 64-bit operand-size to set up args is a double waste here, because the 32-bit ABI ignores them, as well as the fact that `mov $3, %eax` is a shorter way to set RAX=3) — Peter Cordes, May 29 '18 at 12:01
@PeterCordes Yes using gdb, break before and after 'int' and 'print result' — , May 29 '18 at 12:03
@fuz: `int $0x80` in 64-bit code can't explain more than 1 byte changing, because the OP says `input` changed as expected. I'm wondering if the OP used `printf` or something, and was looking at 4 bytes starting at `result`, instead of just 1 byte, and `printf` changed some static data that ended up there. Hmm, nope, OP used GDB. — Peter Cordes, May 29 '18 at 12:03
Huh, wow that's super weird. I just tried it, and I can repro what you were probably doing. I put your code into `test.S` and built it with `gcc -nostdlib -no-pie test.S`. In GDB I set a watch point with `watch (int)result` (to watch 4 bytes like I guessed you were doing). `print (char)result` shows that *your* byte was unmodified, but some bytes after it were modified. — Peter Cordes, May 29 '18 at 12:09
@PeterCordes Oooh, I think that's the problem. I was watching 4 bytes instead of 1. So the value includes the next 3 bytes... As for the bytes next to it changing I think those are the places for 'input' and 'cha' — , May 29 '18 at 12:17
doesn't `print result` treat the `result` as `int` type, i.e. showing the 4 bytes instead of 1 byte? (try to output it in hexa, and check the low 8 bits (the last two hexa digits) = those should be your result byte value) (to print in hexa: `p/x result` — Ped7g, May 29 '18 at 12:17
@Ped7g: Yes, the default used to be `int` for symbols with no debug info. Now, with `gdb` 8.1 (Arch Linux), `print result` says `'result' has unknown type; cast it to its declared type`. This is a very recent change in GDB. This doesn't explain why a system call is spuriously changing user-space memory pages. Oh, apparently GAS + `ld` is placing `.lcomm input, 1` in the same page as the `.data` section. — Peter Cordes, May 29 '18 at 12:22
@PeterCordes And why as+ld does that? I just tried print &result, print &input and print &cha, the addresses are 0x6001a9, 6001aa, 6001ac, why there is a byte in between input and cha? — , May 29 '18 at 12:32
@Space: Unlike using `.section .bss` and reserving space manually, `.lcomm` is free to pad to align a 2-byte object by 2. — Peter Cordes, May 29 '18 at 12:38
why not? (probably alignment, or just in a mood to give it one more byte) If you are asking "why the produced machine code is not 'perfect'?", then that's normal, the compiler doesn't have enough time to find perfect solution, and the definition of what is 'perfect' does change with regards to your intention/goal (size vs performance, etc). If you are just curious why, then you can debug the compiler itself to figure out which part of the code exactly decides this memory layout, but it doesn't sound like much fun, unless you are into compiler programming. — Ped7g, May 29 '18 at 12:38
@Ped7g: the compiler proper isn't involved here, just the assembler + linker, which normally just follow simple rules without looking for optimizations (except branch-displacements, using `jmp rel8` short encodings when possible, and min-size immediate and displacements.) — Peter Cordes, May 29 '18 at 12:41

Peter Cordes · Accepted Answer · 2018-05-29T12:59:19.833

You're looking at 4 bytes starting at result, which includes input as the 2nd or 3rd byte. (That's why the value goes up by a multiple of 256 or 65536, leaving the low byte = 1 if you print (char)result). This would be more obvious if you use p /x to print as hex.

GDB's default behaviour for print result when there was no debug info was to assume int. Now, because of user errors like this, with gdb 8.1 on Arch Linux, print result says 'result' has unknown type; cast it to its declared type

GAS + ld unexpectedly (to me anyway) merge the BSS and data segments into one page, so your variables are adjacent even though you put them in different sections that you'd expect to be treated differently. (BSS being backed by anonymous zeroed pages, data being backed by a private read-write mapping of the file).

After building with gcc -nostdlib -no-pie test.S, I get:

(gdb) p &result
$1 = (<data variable, no debug info> *) 0x600126
(gdb) p &input
$2 = (<data variable, no debug info> *) 0x600128 <input>

Unlike using .section .bss and reserving space manually, .lcomm is free to pad if it wants. Presumably for alignment, maybe here so the BSS starts on an 8-byte boundary. When I built with clang, I got input in the byte after result (at different addresses).

I investigated by adding a large array with .lcomm arr, 888332. Once I realized it wasn't storing literal zeros for the BSS in the executable, I used readelf -a a.out to check further:

(related: What's the difference of section and segment in ELF file format)

...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000126 0x0000000000000126  R E    0x200000
  LOAD           0x0000000000000126 0x0000000000600126 0x0000000000600126
                 0x0000000000000001 0x00000000000d8e1a  RW     0x200000
  NOTE           0x00000000000000e8 0x00000000004000e8 0x00000000004000e8
                 0x0000000000000024 0x0000000000000024  R      0x4

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .text 
   01     .data .bss 
   02     .note.gnu.build-id 

...

So yes, the .data and .bss sections ended up in the same ELF segment.

I think what's going on here is that the ELF metadata says to map MemSize of 0xd8e1a bytes (of zeroed pages) starting at virt addr 0x600126. and LOAD 1 byte from offset 0x126 in the file to virtual address 0x600126.

So instead of just an mmap, the ELF program loader has to copy data from the file into an otherwise-zeroed page that's backing the BSS and .data sections.

It's probably a larger .data section that would be required for the linker to decide to use separate segments.

@Space: I was wrong, it's not storing literal zeros in the file. Updated further with `readelf` output on how the segment that holds both the `.data` and `.bss` sections works. — Peter Cordes, May 29 '18 at 13:00
I think the variables end up in the `.data` section because OP didn't change to the `.bss` section before declaring them. The GNU assembler doesn't put variables in `.bss` unless you explicitly tell it to. — fuz, May 31 '18 at 07:47
@fuz: `.lcomm` reserves space in the BSS, and doesn't care about the current section. I double-checked this by using it after `.text`, and `mov %al, cha(%rip)` doesn't segfault, but a store to a `.byte` in `.text` *does*. Use `nm -n a.out` to check symbol addresses, which includes `__bss_start` and `_edata` so you can tell which static storage is where. — Peter Cordes, May 31 '18 at 07:59
Update on this: `.bss` is implemented by having a MemSiz larger than FileSiz for the segment holding `.data`. The last (or only) page gets extra bytes zeroed beyond the loaded file contents. — Peter Cordes, Oct 22 '22 at 05:12

Gnu assembler .data section value corrupted after syscall

1 Answers1

Linked