As a learning experiment, I am interested in creating a hashtable in assembly (x86-64 in NASM on OSX). One of the requirements is to be able to dynamically allocate/manage memory.
After looking through many resources on how to allocate memory in assembly, most of them recommend either brk
or mmap
syscalls. I haven't learned exactly how these worked yet because I found another implementation of memory allocation in BareMetal-OS that doesn't use any system calls (copied their code below).
My question is, how are they doing this? Can you explain the relevant instructions in their assembly that perform the memory allocation, for someone without a systems programming background and who is new to assembly? The reason for wanting to understand how to implement memory allocation in assembly is to be able to implement a hashtable in assembly.
Being new to assembly (I mainly do JavaScript), and having not found any detailed resources yet on memory allocation in assembly, I don't know where to start. It may be obvious to you, but you have the background, which I don't. I have done some assembly the past week or two, so I understand the basics about mov
on registers, and the jump commands, but don't yet understand the additional stuff they are doing to implement this memory stuff. My thinking is, if they can implement memory allocation in assembly without brk
or mmap
, then I want to do it that way because then I really am manipulating the memory directly without any system layers, and it seems like you can really fine-tune stuff.
Here is their code copied from GitHub:
https://github.com/ReturnInfinity/BareMetal-OS/blob/master/os/syscalls/memory.asm
# =============================================================================
# BareMetal -- a 64-bit OS written in Assembly for x86-64 systems
# Copyright (C) 2008-2014 Return Infinity -- see LICENSE.TXT
#
# Memory functions
# =============================================================================
align 16
db 'DEBUG: MEMORY '
align 16
# -----------------------------------------------------------------------------
# os_mem_allocate -- Allocates the requested number of 2 MiB pages
# IN: RCX = Number of pages to allocate
# OUT: RAX = Starting address (Set to 0 on failure)
# This function will only allocate continuous pages
os_mem_allocate:
push rsi
push rdx
push rbx
cmp rcx, 0
je os_mem_allocate_fail # At least 1 page must be allocated
# Here, we'll load the last existing page of memory in RSI.
# RAX and RSI instructions are purposefully interleaved.
xor rax, rax
mov rsi, os_MemoryMap # First available memory block
mov eax, [os_MemAmount] # Total memory in MiB from a double-word
mov rdx, rsi # Keep os_MemoryMap unmodified for later in RDX
shr eax, 1 # Divide actual memory by 2
sub rsi, 1
std # Set direction flag to backward
add rsi, rax # RSI now points to the last page
os_mem_allocate_start: # Find a free page of memory, from the end.
mov rbx, rcx # RBX is our temporary counter
os_mem_allocate_nextpage:
lodsb
cmp rsi, rdx # We have hit the start of the memory map, no more free pages
je os_mem_allocate_fail
cmp al, 1
jne os_mem_allocate_start # Page is taken, start counting from scratch
dec rbx # We found a page! Any page left to find?
jnz os_mem_allocate_nextpage
os_mem_allocate_mark: # We have a suitable free series of pages. Allocate them.
cld # Set direction flag to forward
xor rdi, rsi # We swap rdi and rsi to keep rdi contents.
xor rsi, rdi
xor rdi, rsi
# Instructions are purposefully swapped at some places here to avoid
# direct dependencies line after line.
push rcx # Keep RCX as is for the 'rep stosb' to come
add rdi, 1
mov al, 2
mov rbx, rdi # RBX points to the starting page
rep stosb
mov rdi, rsi # Restoring RDI
sub rbx, rdx # RBX now contains the memory page number
pop rcx # Restore RCX
# Only dependency left is between the two next lines.
shl rbx, 21 # Quick multiply by 2097152 (2 MiB) to get the starting memory address
mov rax, rbx # Return the starting address in RAX
jmp os_mem_allocate_end
os_mem_allocate_fail:
cld # Set direction flag to forward
xor rax, rax # Failure so set RAX to 0 (No pages allocated)
os_mem_allocate_end:
pop rbx
pop rdx
pop rsi
ret
# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------
# os_mem_release -- Frees the requested number of 2 MiB pages
# IN: RAX = Starting address
# RCX = Number of pages to free
# OUT: RCX = Number of pages freed
os_mem_release:
push rdi
push rcx
push rax
shr rax, 21 # Quick divide by 2097152 (2 MiB) to get the starting page number
add rax, os_MemoryMap
mov rdi, rax
mov al, 1
rep stosb
pop rax
pop rcx
pop rdi
ret
# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------
# os_mem_get_free -- Returns the number of 2 MiB pages that are available
# IN: Nothing
# OUT: RCX = Number of free 2 MiB pages
os_mem_get_free:
push rsi
push rbx
push rax
mov rsi, os_MemoryMap
xor rcx, rcx
xor rbx, rbx
os_mem_get_free_next:
lodsb
inc rcx
cmp rcx, 65536
je os_mem_get_free_end
cmp al, 1
jne os_mem_get_free_next
inc rbx
jmp os_mem_get_free_next
os_mem_get_free_end:
mov rcx, rbx
pop rax
pop rbx
pop rsi
ret
# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------
# os_mem_copy -- Copy a number of bytes
# IN: RSI = Source address
# RDI = Destination address
# RCX = Number of bytes to copy
# OUT: Nothing, all registers preserved
os_mem_copy:
push rdi
push rsi
push rcx
rep movsb # Optimize this!
pop rcx
pop rsi
pop rdi
ret
# -----------------------------------------------------------------------------
# =============================================================================
# EOF
Also note, I have read many resources on creating hashtables in C, one of which I have copied here (which has the C code, and corresponding assembly). However, pretty much all of the C examples use malloc
, which I want to avoid. I am trying to learn assembly without depending on C at all.
Also, this resource from Quora was helpful in pointing to the places in the malloc.c
source code where brk
and mmap
are used. However, I haven't studied that yet because of discovering the BareMetal-OS memory.asm
code, which seems to allocate memory without even using those syscalls. Hence the question, how are they doing that? Can you explain the relevant instructions in their assembly that perform the memory allocation?
Update
This book helps explain pretty much everything about the internals of memory below mmap
and brk
, it's all in the area of implementing operating systems. http://www.amazon.com/Modern-Operating-Systems-4th-Edition/dp/013359162X