Why are there empty address spaces between data sections in memory (x86 / nasm)?

Question

I am trying to write a small program that asks the user for their name, encodes the user input, and then prints a message to stdout, detailing the encoded input. For example, the user inputs the name ‘John’, it will print “Your code name is: Red5” to stdout.

SECTION .data              ; Section containing initialised data

    RequestName: db "Please enter your name: "
    REQUESTLEN: equ $-RequestName

    OutputMsg: db "Your code name is: "
    OUTPUTLEN: equ $-OutputMsg

SECTION .bss               ; Section containing uninitialized data  

    EncodedName: resb ENCODELEN
    ENCODELEN: equ 1024

I have the first part of my output message, “Your code name is: “, stored (starting) at memory address 'OutputMsg', and the second part of the output message, which will be the encoded user input “Red5”, stored at memory address ‘EncodedName’. Therefore, to print the required message to stdout, I concatenate the two, using the following code:

mov rdx,OUTPUTLEN    ; Length of string 'OutputMsg'
add rdx,r8           ; r8 contains the number of bytes entered by the user
                     ; the code name is always equ in length to user input
mov rax,4            ; sys_write
mov rbx,1            ; stdout
mov rcx,OutputMsg    ; Offset of string to print to stdout
int 80h              ; Make kernel call

This works almost as expected. However, the last char is missing from the output. So instead of “Your code name is: Red5”, I get “Your code name is: Red”. On inspection of memory in the debugger, there is an empty memory address (0x00) erroneously ‘placed’ between the end of the ‘OutputMsg’ and the offset for ‘EncodedName’.

Address         Binary    ASCII     
0x… 60012a      0x20      Space  (This is the end of the data item ‘OutputMsg’)
0x… 60012b      0x00      NUL
0x… 60012c      0x52      R (The start of SECTION .bss / 'EncodedName')

I have tested this using several other code examples, and there always seems to be a ‘random’ placement of NUL character(s) between where the SECTION .data ends in memory and the SECTION .bss begins.

1) What is causing this empty address space, as it is not included in my source code?

2) The empty address space appears at the end of SECTION .data in all of the examples I have looked at, I assume therefore that this is expected behaviour. What are the specific reasons for this empty address space, is it to ‘mark’ the end of one section and the beginning of the next? Why would this be necessary?

3) How is the size of the space calculated. I have found that depending on the program and which section I am looking at, sometimes this space is one byte, sometimes two/three; how do I know before runtime how many bytes this empty space will be?

I can work around this. However, I would like to understand what is going on. I have written code that concatenates strings across the two SECTIONS, in order to print to stdout. The unexpected empty address space, which I cannot account for, is throwing off my calculations.

_{NASM version 2.11.08 Architecture x86 | Ubuntu 16.04}

Generally you shouldn't rely on how different sections are placed relative to each other. They might not even be in the order given. The padding between them is determined by alignment requirements. Also the section contents may be coming from multiple files which are then merged by the linker. — Jester, Jul 19 '18 at 12:05
Well, according to [the manual](https://www.nasm.us/doc/nasmdoc7.html#section-7.9.2) NASM by default assumes a 4 byte aligment for most sections. An address ending with `0xb` is not 4-byte-aligned, while one ending with `0xc` is. — Michael, Jul 19 '18 at 12:07
With a larger `.bss` or larger `.data`, they will probably be in separate pages, with some unmapped pages between them. It's up to the linker to decide how to map sections to ELF executable segments ([What's the difference of section and segment in ELF file format](https://stackoverflow.com/q/14361248)), and whether to mark them as being mmapped or copied into memory. — Peter Cordes, Jul 19 '18 at 12:10
See the 2nd part of my answer on [Gnu assembler .data section value corrupted after syscall](https://stackoverflow.com/a/50584542) for some details about how a small program ended up being linked, with the same data and bss next to each other thing you're seeing, rather than in separate pages like I was expecting. — Peter Cordes, Jul 19 '18 at 12:13
BTW, [What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?](https://stackoverflow.com/q/46087730). You should port your code to `syscall` unless you're using 32-bit `int 0x80` for some specific reason. — Peter Cordes, Jul 19 '18 at 12:14
@PeterCordes I think I understand. This is perhaps the problem with learning to ‘code’, before learning how computers actually work! Am I right in saying then, that making the assumption that one section will follow on from another in memory, is _always_ a mistake, as during runtime, as pages are swapped, their location in physical memory can alter. Therefore, my implementation of concatenating two data items across sections, only works at all by shear luck? i.e. `.bss` could be swapped and would no longer follow `.data’ in physical memory; it may not have followed on at all to begin with? — Andrew Hardiman, Jul 19 '18 at 14:20
You don't deal with physical memory so that part, while true, doesn't concern applications. — Jester, Jul 19 '18 at 15:10
@PeterCordes thanks for the the additional notification regards `syscall`. No reason, it is just my primary learning resources have been x86, whereas my machine is x64, I need to make sure I am paying attention to the nuances between the two. — Andrew Hardiman, Jul 19 '18 at 15:42
@Michael Thank you. I could make my code work in that case by making sure that the number of bytes in `section .data` was divisible by 4; indeed this works, although I can see that it is not practical. I’ve seen that you can [manipulate the alignment](https://stackoverflow.com/questions/11277652/what-is-the-meaning-of-align-an-the-start-of-a-section#11277804), although only by a power of 2, so this is not much use here. Ultimately, my assumption that the two data items are contiguous in memory, is wrong. Therefore, my implementation of concatenating ‘strings’ across `sections` is incorrect. — Andrew Hardiman, Jul 19 '18 at 18:04
You can use `gcc -m32` to build 32-bit binaries, instead of trying to port examples to x86-64 at the same time as you're still learning. (e.g. `gcc -m32 -static -nostdlib foo.s` to assemble + link a 32-bit static executable). BTW, "x64" is only used on Windows. The actual name of the architecture is x86-64. — Peter Cordes, Jul 19 '18 at 20:40
BTW, you might be able to use a linker script to specify where you want your sections linked, so you could maybe guarantee that `.bss` follows `.data` directly. But it can only work if 1) `.data` ends at the end of a 4k page or 2) both sections end up in the same page, so the usual private read/write file mmapping of `.data` can't be used, or the zeros filling the BSS have to be stored explicitly in the file. — Peter Cordes, Jul 19 '18 at 20:47

score 2 · Accepted Answer · edited Aug 09 '20 at 16:39

Data alignment:

It is typical to think of memory as a flat array of bytes:

Data:       | 0x50 | 0x69 | 0x70 | 0x43 | 0x68 | 0x69 | 0x70 | 0x73 | 
Address:    |  0   |   1  |  2   |  3   |  4   |   5  |   6  |   7  |   
ASCII:      |  P   |   i  |  p   |  C   |  h   |   i  |   p  |   s  |

However, the CPU itself does not read, nor write, data to memory a byte at a time. Efficiency is the name of the game, therefore, a computer's CPU will read data from memory, a fixed number of bytes at a time. The size in which the processor accesses memory is know as its memory access granularity (MAG).

Memory access granularity varies across architectures. As a general rule, MAG is equal to the native word size of the processor in question, IE. IA-32 will have a 4-byte granularity.

If the CPU were to only read one byte at a time from memory, it would need to access memory 8 times in order to read the entirety of the above array. Compare this to if the CPU were to access memory 4-bytes at a time, a 4-byte granularity. In this case, the CPU would need to access memory only twice; 1 = bytes 0-3, 2 = bytes 4-7.

Where does memory alignment come into play:

Well, let’s assume a 4-byte MAG. As we have seen, in order to read the string “PipChips” from memory, the CPU would need to access memory twice. Now, let’s assume that the data was aligned in memory slightly differently. Let’s assume, the following:

Data:       | 0x6B | 0x50 | 0x69 | 0x70 | 0x43 | 0x68 | 0x69 | 0x70 | 0x73 |  
Address:    |   0  |   1  |   2  |   3  |   4  |   5  |   6  |  7   |   8  |    
ASCII:      |   k  |   P  |   i  |   p  |   C  |   h  |   i  |  p   |   s  |

In this example, to access the same data, the CPU would need to access memory a total of 3 times; 1 = bytes 0-3, 2 = bytes 4-7 and a third time to access “s”, at memory address 8. Furthermore, the processor would have to perform additional work, in order to shift-out the unwanted bytes, that were unnecessarily read from memory, due to the data being stored at an unaligned address.

This is where memory alignment comes in to play. The CPU has a MAG, the main purpose of which is to increase machine efficiency. Therefore, aligning data in memory to match the machines memory access boundaries creates more efficient code.

This is a(n) (overly) simplistic explanation of memory alignment, however it answers the question:

1) What is causing this empty address space, as it is not included in my source code?

The ‘empty address space’ is generated by the alignment requirements of the SECTION data. NASM defaults are assumed, if you do not specify values for the section properties. Please see the manual.

2) What are the specific reasons for this empty address space?

The overriding reason for aligning memory data is for software efficiency and robustness. As discussed, the processor will access memory at the granularity of its word size.

3) How is the size of the space calculated?

The assembler will pad out the section, so that the data immediately following on from it, is automatically aligned to an instance of the specified memory access boundary. In the original question, section .data would have ended at address 0x… 60012a, in the absence of the necessary padding, with section .bss starting at address 60012b. Here, the data would not have been properly aligned with the memory access boundary defined by the CPU's access granularity. Consequently, NASM, in its wisdom, adds a padding of one nul character, in order to round the memory address up to the next address that is divisible by 4, and hence, properly align the data.

The subtleties of memory access are many; for a more in-depth explanation, please see the wiki, and numerous on-line articles, e.g. here; and for the masochistic among you, there are always the manuals!

Generally, data alignment is handled automatically by the complier/assembler, although programmer control is an option and in some cases desirable.

…………………………………………………………………………………………………………................................

Solving the original problem:

We are still left with the question of how to concatenate our two strings for output. We know now that the implementation of concatenating two strings across sections is not ideal, to say the least. Generally, we will not know where these sections are placed, in relation to each other, during runtime.

It is preferable therefore, to concatenate these strings in a region in memory, before making the syscall; as opposed to relying on the system call to provide the concatenation, based on assumptions of where the strings ought to be in memory.

We have several options:

Make two sys_write calls in succession, in order to print both strings, and give the illusion in the output that they are one: Although straight forward, this makes little sense, as system calls are expensive.
Directly read the user input into place: This seems the logical and most efficient thing to do, at least at first glance. As we can write the string without moving any data around, and with only one syscall. However, we face the problem of inadvertently overwriting data, as we have not reserved the space in memory. Also, it seems ‘wrong’ to read user input to the initialized .data section; initialized data is data that has a value before the program begins!
Moving ‘EncodedName’ in memory, so that it is contiguous with ‘OutputMsg’: This seems clean and simple. However, in reality it is not really any different to option 2, and suffers the same drawbacks.
The solution: Create a memory buffer and concatenate the strings into this memory buffer, prior to the sys_write system call.

SECTION .bss
```
 EncodedName: resb ENCODELEN
 ENCODELEN: equ 1024

 CompleteOutput: resb COMPLETELEN
 COMPLETELEN: equ 2048  
```

User input will be read to ‘EncodedName’. We then concatenate ‘OutputMsg’ and ‘EncodedName’ at ‘CompleteOutput’, ready for writing to stdout:

    ; Read user input from stdin:
    mov rax,0                               ; sys_read
    mov rdi,0                               ; stdin
    mov rsi,EncodedName                     ; Memory offset in which to read input data
    mov rdx,ENCODELEN                       ; Length of memory buffer
    syscall                                 ; Kernel call
    
    mov r8,rax                              ; Save the number of bytes read by stdin
    
    ; Move string 'OutputMsg' to memory address 'CompleteOutput':
    mov rdi,CompleteOutput                  ; Destination memory address 
    mov rsi,OutputMsg                       ; Offset of 'string' to move to destination
    mov rcx,OUTPUTLEN                       ; Length of string being moved
    rep movsb                               ; Move string, iteration, per byte
    
    ; Concatenate 'OutputMsg' with 'EncodedName' in memory:
    mov rdi,CompleteOutput                  ; Destination memory address
    add rdi,OUTPUTLEN                       ; Add length of string already moved, so we append strings, as opposed to overwrite
    mov rsi,EncodedName                     ; Offset memory address of string being moved
    mov rcx,r8                              ; String length, during sys_read, the number of bytes read was saved in r8
    rep movsb                               ; Move string into place
    
    ; Write string to stdout:
    mov rdx,OUTPUTLEN                       ; Length of 'OutputMsg' 
    add rdx,r8                              ; add length of 'EncodedName' 
    
    mov rax,1                               ; sys_write
    mov rdi,1                               ; stdout
    mov rsi,CompleteOutput                  ; Memory offset of string
    syscall                                 ; Make system call

_{*Credit due to the comments in the original question, for pointing me in the right direction.}

*a fixed number of bytes at a time* As you say, that's definitely over-simplified, but yes the default section alignment is 4 bytes and natural alignment for `int` is the main reason. Any unaligned access that doesn't cross an 8-byte boundary is a single cache access (and is atomic) on Intel and AMD ([Why is integer assignment on a naturally aligned variable atomic on x86?](https://stackoverflow.com/q/36624881)). On Intel P6 and later (Pentium Pro / PII), any cached unaligned access that doesn't cross a cache-line boundary (64 bytes on modern x86) is atomic (implies a single cache access). — Peter Cordes, Jul 23 '18 at 15:30
And BTW, modern CPUs *can* load/store single bytes without *having* to read/modify/write the containing word. [Can modern x86 hardware not store a single byte to memory?](https://stackoverflow.com/q/46721075). — Peter Cordes, Jul 23 '18 at 15:32
*The ‘empty address space’ is generated by the alignment requirements of the SECTION data.* In this case yes, but in other cases there will often be a large gap of multiple pages. Your case where `.data` and `.bss` are combined into a single page with just a bit of padding separating them is a special-case optimization for programs with very small data/bss. Your answer seems to imply that alignment is the *only* reason for a gap in the general case. And BTW, most of the URLs in your answer came out broken. Bare URLs will turn into links, or click the "link" button to use markdown. — Peter Cordes, Jul 23 '18 at 15:34
Re: your possible solutions: you can avoid copying the user input. **First copy `OutputMsg` from `.rodata` to your large buffer in `.bss`, then `sys_read` into place right after it**. (Or sys_read into place first; you know the length of `OutputMsg` so you can leave room to copy it without actually copying it. Having the first access to the page be inside the kernel might make the page-fault handling more efficient.) You're right that putting enough explicit zeros into `.data` would be inelegant, because they'd actually be stored in the executable file instead of just space reserved. — Peter Cordes, Jul 23 '18 at 15:43
To set RDX to the right length, you can use `lea edx, [rax + OUTPUTLEN]`. `rep movsb` doesn't clobber RAX, or you can do that before the copy. You don't need 64-bit operand-size; remember your buffer is only 2k, `sys_read` can't have returned a size that won't fit in 32 bits. You can use `CompleteOutput + OUTPUTLEN` directly, instead of using an `add` at runtime. e.g. `mov rsi, CompleteOutput + OUTPUTLEN` before `syscall` to read into place. (Or better, `mov esi, ...` or `lea rsi, [rel CompleteOutput + OUTPUTLEN]`. 64-bit immediate `mov` is a poor choice for addresses.) — Peter Cordes, Jul 23 '18 at 15:53
@PeterCordes Appreciate the input re’ the solutions. I was thinking along the lines, that reading straight into place must be the most elegant/efficient solution, but kept thinking no, because I would inadvertently overwrite data – so this is perfect; “*First copy ‘OutputMsg’ from `.rodata` to your large buffer in `.bss`, then `sys_read` into place right after it*” – Damn it, why didn’t I think of that! BTW, i’m not sure why the links broke, HTML was valid, changed it to [X](URL) and it’s come out fine. — Andrew Hardiman, Jul 23 '18 at 17:36
SO doesn't accept arbitrary HTML (and places limits on what sites you can link to, e.g. no URL shorteners); it probably ate part of the tags. — Peter Cordes, Jul 23 '18 at 17:42
@PeterCordes thanks. I'll bear that in mind. Can I ask, with the solution you gave, I would no longer require 'EncodedName' or 'ENCODELEN' in `SECTION .bss`. However, without these I would no longer have a variable to fix the length for `sys_read`, I do not like stating this as an immediate with the instruction, as I may want to easily change it later, how would you specify the size count for the read? — Andrew Hardiman, Jul 23 '18 at 17:51
Right, you don't need those. Use `mov edx, COMPLETELEN - OUTPUTLEN` to have the assembler calculate the space left in your buffer beyond the length of what you're going to copy into it. Or even put a label at the appropriate position in the buffer: `outbuf : resb msglen` / `read_position: resb 2048` (or 2048 - msglen or whatever you like). Add whatever `equ` size calculations you like to get names assemble-time constants. — Peter Cordes, Jul 23 '18 at 17:57
@PeterCordes I never thought of creating an address like so `mov rsi,CompleteOutput + OUTPUTLEN`, I think I am getting it confused with effective addressing; I better go away and do some more reading. Thanks for the input, it is appreciated. — Andrew Hardiman, Jul 23 '18 at 18:19
Anything that's a link-time constant can be done at build time. It shouldn't be that confusing; the addressing mode doesn't change, but the assembler creates an object file that asks the linker to add an offset to the symbol name. It's the same thing a compiler would do if you wrote `int foo(){ return static_array[10]; }`, e.g. `mov eax, [static_array + 40]` There's no label on that address itself, but it's a known distance from a label so you can reference it relative to that. (Or like I said in my last comment, *put* labels where you want them with `resb msglen` / `label: resb some_more` — Peter Cordes, Jul 23 '18 at 18:26

Why are there empty address spaces between data sections in memory (x86 / nasm)?

1 Answers1