4

Recently im learning how to write a boot sector, here is the complete code that i am learning:

org 07c00h
    mov ax, cs
    mov ds, ax
    mov es, ax
    call DispStr
    jmp $

DispStr:
    mov ax, BootMessage
    mov bp, ax
    mov cx, 16
    mov ax, 01301h
    mov bx, 000ch
    mov dl, 0
    int 10h
    ret

BootMessage: db "Hello, OS!"
times 510-($-$$) db 0

dw 0xaa55

a very simple code if you know how to boot a system. the result is a line Hello OS! displayed on the screen, the only thing that i dont know is the first line: org 07c00h.

The book tells me that the line of code let the compiler to locate the address to the 7c00h place, but the explanation is very ambiguous, and I really don't know whats the use of it here. what in the world does the line org 07c00h do here?

I tried to remove the line, and use nasm to create a bin file, then use the bochs to boot the bin file. Nothing different from the previous one: "hello OS!" displayed on the screen too.

Can i say that the first line does nothing here? What's the use of org xxxx?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Searene
  • 25,920
  • 39
  • 129
  • 186
  • It means exactly what the book says. If you don't understand it, you should probably review the basics again. In particular, you need to understand how memory works. – Karl Knechtel Apr 24 '12 at 15:52
  • 3
    As the [nasm manual](http://www.nasm.us/doc/nasmdoc7.html#section-7.1.1) says: "The function of the ORG directive is to specify the origin address which NASM will assume the program begins at when it is loaded into memory.". I.e. you're telling the assembler something it can't figure out on its own: at what address the program will be loaded. – user786653 Apr 24 '12 at 16:32
  • 1
    @Karl: And you should understand first what helping and being kind is and how to answer people to enlighten them instead of just pissing off. – SasQ Jun 17 '12 at 21:47
  • 2
    So you could help him understand those fundamentals. If you had the time to write such uninformative comment, you had (the same) time for writing something more enlightening. It's not needed to write a book, it's enough to throw a link to some explanation of memory segmentation somewhere over the Net. Why to comment only to not help? – SasQ Jun 17 '12 at 22:49

2 Answers2

8

The assembler is translating each line of your source code to processor instruction and generates these instructions in sequence, one after another, into the output binary file. Doing that, it maintains an internal counter which counts the current address of any such instruction, starting from 0 and upwards.

If you're assembling a normal program, these instructions will end up in the code section at some object file with just blank slots for addresses, which have to be filled in with proper addresses by the linker afterwards, so it's not a problem.

But when you assemble a flat binary file without any sections, relocations and other formatting, just raw machine instructions, then there is no information for the assembler about where are your labels indicating to and what are the addresses of your code & data. So, for example, when you have an instruction mov si, someLabel, then the assembler can only calculate the offset of this label starting from 0 at the beginning of the binary file. (i.e. the default is ORG 0 if you don't specify one.)

If it's not true, and you want your machine instructions+data in memory to begin from some other address, e.g. 7C00, then you need to tell the assembler that the starting address of your program is 7C00 by writing org 0x7C00 at the beginning of your source. This directive tells the assembler that it should start up its internal address counter from 7C00 instead of from 0. The result is that all addresses used in such a program will be shifted by 7C00. The assembler simply adds 7C00 to each of the address calculated for each label. The effect is as if the label was located in memory at the addres, say, 7C48 (7C00 + 48) instead of just 0048 (0000 + 48), no matter that it is offset only 48 bytes from the beginning of the binary image file (which, after loading at the offset 7C00 will give the proper address).

These "addresses", if used directly like jmp si or mov al, [si], are the offset part of seg:off logical addressing, where in real mode the segment part is left-shifted by 4 to get a base that the offset adds to. (So 07C0:000 and 0000:7C00 address the same linear address, 7C00.) The segment part comes from whatever you've put into the relevant segment register, or whatever the BIOS left there if you didn't set it to a fixed value.

If your cs, ds, and/or es segment registers are set to match where in linear address space your MBR is loaded (always 7C00), so the first byte of your file is at es:0 for example, using that offset with a correctly-set segment base will actually reach your data. jmp si will jump to that label if cs is set so cs:si is where your code is. i.e. if cs:org references the first byte of your MBR. mov ax, [si] will load 2 bytes from it if ds is set correctly.

In your case, int 10h/ah=13h uses es:bp, and there are no other uses of absolute addressing, only relative jumps/calls whose encoding doesn't depend on org. You set es from cs at the start of the bootloader for some reason, instead of setting it to a fixed value to match the org you're using. This is a bug; your bootloader won't work on BIOSes that jump to the MBR with CS:IP = 07C0:0000, only ones that use 0000:7C00 matching your org. Fix this by replacing mov ax,cs with xor ax,ax; it doesn't matter whether DS/ES are different from CS or not, just that ES: BootMessage-$$ + org is where your data actually is.


Linear vs. Logical addresses

As to your other question: 7C00 is the linear physical address of the bootloader. You can represent this physical address as a logical address (segment:offset) in different ways, because segments overlap (next segment starts 16 bytes (10 in hex) after the previous one). For example, you can use logical address 0000:7C00 which is the simplest configuration: you use segment 0 starting at the beginning of your RAM, and offset 7C00 from that 0. Or, you can use logical address 07C0:0000, which is 7C0th segment. Remember that segments start 16 bytes apart from each other? So you simply multiply this 7C0 by 10 (16 in decimal) and you get 7C00 -- see? It's a matter of shift one position to the right in your hexadecimal address! :-) Now you just add your offset, which is 0 this time, so it's still 7C00 physically. The byte 0 in segment 07C0 which starts at 7C00 in memory.

Of course you can also use more complicated addresses, like, for example, 0234:58C0, which means that the segment starts at 2340 and when you add 58C0 offset to it, you'll get 7C00 again :-) But doing that could be confusing. It all depends on what configuration you need. If you want to consider the 7C00 physical address as the start of your segment, just use segment 07C0 and your first instruction will be at offset 0, so you don't need to put org directive, or you can put org 0 then. But if you need to read/write some data below the 7C00 address (for example, peek the BIOS data or fiddle with interrupt vectors), then use segment 0 and offset 7C00 which means your first instruction (0th byte in your binary file) will be located at 7C00 physical address in memory; then you have to add org 0x7C00 directive from the reasons described above.


The BIOS will jump to your code with CS:IP = 07C0:0000 or 0000:7C00. And with unknown values in DS/ES/SS:SP. You should write your bootloader to work either way, using xor ax,ax / mov ds,ax to set DS base to zero if you're using org 0x7c00.

See Michael Petch's general tips for bootloader development for more about writing robust bootloaders that avoid making assumptions about the state the BIOS left, except for ones that all BIOSes must get right to work at all with mainstream software. (e.g. loading your 512-byte MBR at linear address 0x00007c00 and drive number in DL).

Almost(?) all BIOSes start an MBR with either CS=0 or CS=07C0, not some other seg:off way of reaching the same linear address. But you definitely shouldn't assume one or the other.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
SasQ
  • 14,009
  • 7
  • 43
  • 43
  • You mentioned "Remember that segments start 16 bytes apart from each other?", is this always the case? Or can you let me know when it would differ? – supmethods Aug 31 '18 at 01:03
  • 1
    Yes, in real mode, it's always the case. On old computers it was done this way in hardware (address lines shifted by 4 bits in the memory management unit when calculating addresses). When Intel introduced protected mode in their CPUs (from 286 up), segment registers have different behaviour: they index in special tables in memory (LDT, GTD or IDT), and these tables tell the CPU where the particular segments start. However, in real mode, they're still 16 bytes apart (and overlapping), for backward compatibility. – SasQ Sep 02 '18 at 11:22
  • @PeterCordes: Although I appreciate what you added about the general tips for bootloader development, I think that your edit that changes my "code" segment to "data" segment is terribly misrepresenting what I said. I remind you that the question was about the ORG directive, so in my answer I was talking about the ORG directive determining the assumed address from which the INSTRUCTIONS (that is, CODE) will be generated by the assembler. Changing it to DATA puts words in my mouth that I didn't say and never meant to say. (That's why I hate people editing my posts instead of writing their own.) – SasQ Jan 23 '22 at 04:54
  • Oh, did you not intentionally make this answer community wiki? ORG has no effect on normal (non-far) jumps and calls, because they're relative (`call rel16` is encoded to the same 3 bytes regardless of ORG). The important part is that it *does* affect how NASM calculates the address for `mov ax, BootMessage`. (Which should just be `mov bp, BootMessage`...). That address is only ever used as an offset from the ES base (by [the `int 10h` / `ah=13h` BIOS write-string call](https://en.wikipedia.org/wiki/INT_10H)), not with CS. – Peter Cordes Jan 23 '22 at 05:57
  • There aren't separate DATA and CODE segments in the file NASM is making; it's a flat binary. And the source doesn't even use `section .data` or `section .text` anywhere. (If you do, NASM still flattens it when assembling, with `mov reg, symbol` getting an appropriate offset for wherever NASM put it in the file. NASM doesn't know about memory models; I think the offset is always relative to the top of the file (ORG) whether or not you use a `section .data`. (`section` is a synonym for segment, and [there's no `assume` directive](https://nasm.us/doc/nasmdoc2.html#section-2.2.4).) – Peter Cordes Jan 23 '22 at 06:01
  • Of course, if you did something like `mov ax, DispStr` / `call ax`, then the relevant segment *would* be CS. But NASM doesn't know or care what you're going to do with it when calculating what immediate to encode for `mov ax, symbol`; it doesn't make any distinction between code or data. `db` and `add` are both ways to emit bytes into the output at whatever position you use them. – Peter Cordes Jan 23 '22 at 06:06
  • I checked, and yeah even with a `section .data`, NASM fills in absolute symbol addresses as offsets relative to the start of the whole flat binary, from whatever ORG is there. (NASM's `-l /dev/stdout` listing shows relative to sections, but `ndisasm -b16` shows the truth: https://godbolt.org/z/T61a6j18G). If you have seg regs set to match, either of our descriptions are sort of right. (NASM also combines all of the `.text` section together, even if you switch back and forth, like a separate linker would, but you can do `section .text2` to put more code after your data.) – Peter Cordes Jan 23 '22 at 06:25
  • @PeterCordes _"Oh, did you not intentionally make this answer community wiki?"_ I might, I don't remember after that many years. But whether I did or not, does it allow people to change and misrepresent what I said? :q I don't think that any of the CC licenses allow for infringing the author's PERSONAL copyrights (e.g. right for attribution, right for not misrepresenting or corrupting his work, etc.), and yet people seem to think that they can do whatever they like with other people's works here on Stack, which is hideous. – SasQ Jan 23 '22 at 19:03
  • @PeterCordes _"ORG has no effect on normal (non-far) jumps and calls"_ But it _does_ have an effect on the assumed address at which the generated instructions will be located in the address space. And that's what I was talking about. – SasQ Jan 23 '22 at 19:05
  • @PeterCordes _"There aren't separate DATA and CODE segments in the file NASM is making; it's a flat binary."_ I never said there are. I started from describing how NASM generates data & code sections in a normal mode of operation, but then described how it gets different when it generates a FLAT BINARY code. I specifically said that in flat binaries there's no code & data SECTIONS, but in segmented memory, there's still CODE and DATA SEGMENTS. Instructions are never executed in a DATA segment, because they're being executed from CS:IP. You changing it to DATA is WRONG. – SasQ Jan 23 '22 at 19:09
  • *does it allow people to change and misrepresent what I said?* - Explicitly yes, community wiki means you're *not* really associating your name with it, like a Wikipedia article. If future readers think something is explained poorly or wrong in a community wiki answer, they should change it, unlike with regular answers. See [What are "Community Wiki" posts?](https://meta.stackexchange.com/q/11740) on meta. Copyright has nothing to do with anything; I don't own your original, I just made a derivative work following the CC-BY-SA licence you released it under by posting it to SO. – Peter Cordes Jan 23 '22 at 19:34
  • A "derivative work" would be if you made a NEW post based on my own, and precisely stated which parts are your own contributions. Changing someone's work in place, which is still firmed with his name, is NOT a derivative work. Stack people got it wrong, and I wish I had lawyers to settle this thing with them in court. – SasQ Jan 23 '22 at 19:37
  • _"If future readers think something is explained poorly or wrong"_ But there _wasn't_ anything "wrong" in my original answer! It is _now_, after you changed it! – SasQ Jan 23 '22 at 19:38
  • I had a 2nd look at what you were saying about code vs. data. Yes, you're right there was a conflict, I should have also changed the part earlier in that paragraph to say "code + data" because the whole flat binary is a combination of both. Also, not talk about segments and segment registers at all there, because that's not how NASM thinks when it's calculating symbol addresses for flat binaries. I moved the discussion of segments to fully after the paragraphs about assuming org 0 vs. specifying an org. (Some what what I wrote is redundant with your later logical vs. linear section, sorry.) – Peter Cordes Jan 23 '22 at 22:19
  • 1
    Re: copyright: you own the copyright to your contributions. But Stack Overflow decides which edit to show in this space. Go read again the link about what "community wiki" means on answers; my assumptions about how much change I can/should make when editing was based on it being CW. There's also a meta [Ownership of content in Community Wiki posts](https://meta.stackexchange.com/q/1084) about attribution, although that's talking about quoting/citing an answer in a book or something, not "ownership" of the space where Stack Overflow displays someone's version. – Peter Cordes Jan 23 '22 at 22:23
  • 1
    I get what you're saying about putting words in someone else's mouth, regardless of how copyright law works. I think the key point there is the fact that this post is community wiki, so people looking at it know it's not entirely yours. I still think the original version needed improvement. My initial edit had some flaws, though, thanks for getting me to take a 2nd look. I made a major edit to change where my new paragraphs go, and I think remove the problems you identified. (At least the ones I think were actual problems.) – Peter Cordes Jan 23 '22 at 22:28
4

It is where you have an assembler and linker in one step. The org tells the assembler which tells the linker (in these cases often the same program) where in physical memory space to put the code that follows. When you use a C compiler or some other high level language compiler you often have separate compile and link steps (although the compiler often calls the linker for you behind the scenes). The source is compiled to a position independent object file with some of the instructions left unimplemented waiting on the link step. The linker takes objects and a linker script or information from the user describing the memory space and from there then encodes the instructions for that memory space.

User786653 set it quite well it tells the assembler something it cant figure out on its own the memory space/address where these instructions are going to live in case there is a need to make position dependent encodings in the instructions. Also it uses that information in the output binary if it is a binary that includes address information, for example elf, srec, ihex, etc.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    thx, but what does `org 7c00h` mean? the segment is 7c00h, or the offset is 7c00h? which tool can i use to detect the address? – Searene Apr 25 '12 at 07:53
  • 1
    all right, using the bochs debugger, i found that the 0x7c00h was added in the part of offset address, if without the first line, `org 07c00h`, the system will load the wrong address of string `BootMessage`, thx a lot. i learned a lot. – Searene Apr 25 '12 at 08:13