5

When an assembly program is compiled and run on a machine without an operating system, how is a starting address in RAM chosen so that variables declared with data directives are allocated properly?

  • 1
    I think it's typically predefined by the CPU. For example, the 8086 was hard-wired to start executing at address FFFF0, which is assumed to be within ROM. What processor architecture are you using? It could be different for each. – Joe White Feb 03 '12 at 17:01
  • x86. So you're saying that the first program that runs gets that address? If so, what about subsequent programs? – TheResolute Feb 03 '12 at 17:04
  • 3
    For subsequent programs, *you* manage it. Also, can they really be called programs? At that point they're really just blocks of data that are executed in the hopes of it being code. – harold Feb 03 '12 at 17:11
  • Does the assembly code for either the first program or any subsequent programs ever include any explicit RAM addresses such as 0x00000? – TheResolute Feb 03 '12 at 17:22
  • The OS manages RAM; I'm not sure I understand your question. The OS starts at some predetermined address – Adrian Feb 03 '12 at 17:31
  • Daniel seems to have answered my question – TheResolute Feb 03 '12 at 17:37

2 Answers2

7

As said, the address under which the CPU expects the starting program to be is usually hardwired. It's probably programmable in the case of some very specific CPUs or such, but in the case of x86, it's FFFF0, or - to be more exact - FFFFFFF0, so 16 bytes below the CPU's uppermost physical address. The motherboard usually maps these addresses to the ROM, which contains (most probably) a jump to the BIOS code, which in turn boots up the computer.

When it comes to operating systems themselves, they choose where to load the program, then they do the actual loading, and transfer the execution to it. In the case of DOS, for example, simple small applications (those distributed as COM files) were loaded at address 100, and then the command prompt executed a jump to that address, effectively beginning execution of the code loaded at that address. With more advanced systems which employ virtual memory, the issue is of course more complicated.

Daniel Kamil Kozar
  • 18,476
  • 5
  • 50
  • 64
  • 1
    Writing a program that will run on a machine without an operating system would require explicitly defining RAM addresses, because the operating system is responsible for memory allocation for other programs. Correct? – TheResolute Feb 03 '12 at 17:32
  • Yes. If you run a program without an operating system, the program itself is in fact the operating system. :) Therefore, you have to access every unit of memory by giving its explicit address. You have to keep in mind, though, that all the addresses aren't always available to ordinary applications even in real mode due to memory mapped devices and such. – Daniel Kamil Kozar Feb 03 '12 at 18:31
  • Real mode is very outdated a concept. Ring 0 would perhaps have been better: http://en.wikipedia.org/wiki/File:Priv_rings.svg – Prof. Falken Feb 23 '12 at 14:50
4

Each processor and sometimes subsets of a processor family have a different booting scheme. usually one of these categories:

1) a table of addresses for different kinds of interrupts/events (reset, interrupt, nmi, etc) often called a vector table this table is in a hardcoded location, the hardware powers up reads an address at the reset vector and then starts executing code at the address specified in the reset vector.

2) a table of instructions, the addresses for these instructions are at a known location, ARM for example uses, this, you get one instruction per vector/event so that one instruction has to be a load pc or a branch (or the vectors that follow are not to be used and you can then use more instructions to get out of the vector table.

3) let me think about some others

Some companies (processor architectures) use a high address, for example 0xFFFE being the reset vector and 0xFFFC being interrupt. msp430 and the 6502 I think are like this. As you have seen in other answers the 8088/86 and family use 0xFFFxxxFFFF0 as the entry point/reset address.

Other folks like ARM and the Atmel AVR use address 0x0000 as the entry point.

If you go into your wayback machine when the processors memory interface was very simple some address bits, data bits, read and write strobes and maybe a chip select or output enable or something. You would use external logic, 74LSxx parts for example to decode some of the upper address bits, decoding all zeros and all ones is easy, so you would either have rom at the upper addresses (upper address bits all ones) and ram at the lower addresses (upper address bits all zeros) or vice versa. Even today in microcontrollers and processors in general this makes life easy. So your ARM and AVR will tend to assume that address 0 is rom and ram is somewhere else. your x86 and msp430 and others will put rom at the top and ram down below or somewhere in the middle.

Do you have to hardcode this address in assembly language? Well not necessarily, usually you hardcode something for the linker so that it puts the vectors in the right place, but there are assemblers that still support .org type directives where you are stating the code after this directive is to be placed at this address in the memory space.

Every processor has some sort of bootloader, even your pc. This is a single program that is compiled knowing the processors booting rule that is the first bit of code that is run on that processor. From there it can vary, you can only have that one program and that is it you have no plans to change programs after boot, sometimes this is just called firmware. Now sometimes the bootloader has a prompt of some sort, maybe you use a serial port and a dumb terminal to type commands or download a program using xmodem, etc (uboot, redboot, etc do this sort of thing). Often you want to boot and get up and running without human intervention so there might be an opportunity to press a key on a terminal/serial port or to press a button on the board when booting, etc. Otherwise the bootloader has a set of rules for what to do next, and those rules are unique to the author(s) of the bootloader, this is just software you can do whatever you want. Your computer for example this bootloader is called BIOS collectively and you can probably press F1 or F2 or DEL or some other key or key sequence and interrupt the normal boot process to do something else (change BIOS settings, etC). Otherwise the BIOS has a set of rules (sometimes/often that you can modify) for where it looks for the next thing to load and run. Usually trying to find some media like a hard disk or usb stick or floppy or cd/dvd rom that contains a boot sector. That boot sector is just more code, the BIOS enforces rules about what that code has to look like or what address, etc it has to be compile for, it loads that code then being another form of bootloader that code can do whatever it wants, boot linux, boot windows, boot dos, ask you some questions about what you want to boot, run a memory test program, whatever.

Down the road you might load another program, an operating system, a single program that has its own rules that it follows and if it chooses to allow other programs to be loaded those programs have to live by rules. Sometimes when you have an mmu you can make a program think it is running at one address when it is really at another, but that makes it so you can force all the programs to assume they are running from a specific address and then just compile them for that address (most of the programs you write for windows/linux, etc are like this). but if you dont have an mmu then there could be different rules like the whole program has to be position independent code (no hardcoded addresses anywhere), I am told uclinux is like this.

Microcontrollers are a good place to learn this kind of thing as you might already have a bootloader in place (the arduinos come this way, the lpc and st chips often have a bootloader, serial or usb or both). Sometimes like the avr's the bootloader is in a flash that you can modify, separate bootloader flash and application flash, sometimes the bootloader is in a flash but only the chip manufacturer knows how to modify/change it so you are stuck with their bootloader forever. In either case you can still make your own bootloader that rides on top of that where you can create a way to load other programs, rules for those programs to exit back to the bootloader so that other programs can run, etc. I have some instruction set simulators (in various stages of debugging, the thumbulator one has had some use to flush it out, as is the pcemu one that I forked from someone else) that you can wrap a dumb terminal around and make your own environment without having to actually buy anything (can do this with qemu and gdb but not as easy).

If you look into these microcontrollers in particular, but is a problem with all processors, typically the boot code, including the vectors are in rom, once you are running though you dont always want those vectors to be the ones hard coded in rom, you either have to make some sort of scheme that the rom code branches to something in ram somehow or, there is a hardware scheme where after booting you can flip a switch somewhere (write a bit to a byte somewhere in a register) and change what memory is decoded at that address, allowing you to switch in ram and replace the vector table, with something soft. some of these microcontrollers have some overly complicated schemes for this. Here again you have to look at the documents for that processor and/or chip to find out how it works. Sometimes it is something outside the chip on the board that causes the address space to change, so you have to read up on that.

Because most programmers at this level already know this stuff, the documents often do not go into much detail. They might simply state the vector table starts here and here are the items or it may go through each item in the table specifying the address, but might never state whether the contents of that table are an address into memory or an instruction to be executed...You have to "just know". And how you learn is by asking others, in the form of looking at other peoples code.

variables and data are managed by you and the compiler. the language you use and the compiler you use will for example create .text, .data, .bss segments, and through the compiler or linker you have to give the tools specific addresses where to place each of these things for that binary. And in this deeply embedded case that means going to the documents for the system (chip plus board) to find out where each part needs to be in order to boot and run properly. AND you have to put that information there somehow as well. if for example you write code that creates a .data (I normally dont) segment, and you are booting from flash you have to have some scheme for having the compiler compile such that the .data segment is in ram, but when you boot it isnt in ram, you have to have a copy of the .data segment somewhere in rom then you have to copy it to ram before you get into the main body of code. Likewise you have to setup stack pointers and if needed modify the vector table, etc. if using dram you have to initialize the memory with code that may have to run without memory, to get dram up then copy .data, zero .bss, set the stack pointer(s) then branch to main().

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • A directive like `.org 0x7c00` (for a legacy x86 PC BIOS MBR boot sector) doesn't *cause* the code to be loaded at that address. Instead it *tells* the assembler what address it should expect the code / data to be at. x86 (before x86-64) doesn't have any PC-relative addressing modes, so an instruction like `mov dx, OFFSET msg` needs to embed the *absolute* address of the data into an immediate. To do this correctly, the assembler needs to know the right origin (base address) to assume for position-dependent code. It would be unnecessary for position-independent code, e.g. using `call/pop`. – Peter Cordes Dec 31 '18 at 20:59
  • @PeterCordes yep, helps the linker, nothing you do can "cause" the code to end up somewhere there are assumptions and trust but those can fall apart easily... – old_timer Jan 01 '19 at 15:09
  • Fun fact: When assembling a flat binary with many x86 assemblers, there is no linker. e.g. `nasm` defaults to making a flat binary directly, and only makes a `.o` for a linker with `nasm -felf` or similar. So `org 0x7c00` (the NASM or MASM directive, not `.org` for GAS) is purely at assemble time, there is no linking. GAS is different, though; you can't directly make a flat binary, and need `ld`. You can even provide a linker script to control the layout. [How to generate plain binaries like nasm -f bin with the GNU GAS assembler?](https://stackoverflow.com/q/6828631) – Peter Cordes Jan 01 '19 at 21:53
  • well when I say linker the portion of the linking job to place the items in a memory space then place them in an output binary, be it the one assembled chunk of images or an actual object. If there are multiple .org items then you are in some way linking them into a binary. I was thinking in terms of masm and tasm and basm and others that predate those, which nasm is our currently maintained clone. I agree though that a simple single file with a single .org makes it easier to skip formal steps and aim for a final binary. – old_timer Jan 01 '19 at 22:13