4

I am learning assembly for fun and its just my 3rd day today. Perhaps I mis-understood the location counter in linker script. As per my understanding, location counter defines at which address in memory, the sections must be loaded in memory (physical or virtual).

However, the following linker script taken from this SO post seems to alter the resulting image (to put the magic number in last 2 bytes of resulting MBR image).

link.ld

SECTIONS
{
    /* The BIOS loads the code from the disk to this location.
     * We must tell that to the linker so that it can properly
     * calculate the addresses of symbols we might jump to.
     */
    . = 0x7c00;
    .text :
    {
        __start = .;
        *(.text)
        /* Place the magic boot bytes at the end of the first 512 sector. */
        . = 0x1FE;
        SHORT(0xAA55)
    }
}

My code is:

main.S

.code16
    mov $msg, %si
    mov $0x0e, %ah
loop:
    lodsb
    or %al, %al
    jz halt
    int $0x10
    jmp loop
halt:
    hlt
msg:
    .asciz "hello world"

I assemble and link with:

as -g -o main.o main.S
ld --oformat binary -o main.img -T link.ld main.o
qemu-system-x86_64 -hda main.img

Sooner I realized that the option --oformat binary has to do something with this, as excluding this does not create 512 byte image. Maybe I should be looking for ELF vs binary format? Can someone please help me understand why binary format was used, how it interprets location counter (as it should have done something with . = 0x7C00 as well)?

Hexdump of resulting 512 byte hello world image gives me this :

00000000  bf 0f 7c b4 0e ac 08 c0  74 04 cd 10 eb f7 f4 68  |..|.....t......h|
00000010  65 6c 6c 6f 20 77 6f 72  6c 64 00 66 2e 0f 1f 84  |ello world.f....|
00000020  00 00 00 00 00 66 2e 0f  1f 84 00 00 00 00 00 66  |.....f.........f|
00000030  2e 0f 1f 84 00 00 00 00  00 66 2e 0f 1f 84 00 00  |.........f......|
00000040  00 00 00 66 2e 0f 1f 84  00 00 00 00 00 66 2e 0f  |...f.........f..|
00000050  1f 84 00 00 00 00 00 66  2e 0f 1f 84 00 00 00 00  |.......f........|
00000060  00 66 2e 0f 1f 84 00 00  00 00 00 66 2e 0f 1f 84  |.f.........f....|
00000070  00 00 00 00 00 66 2e 0f  1f 84 00 00 00 00 00 66  |.....f.........f|
00000080  2e 0f 1f 84 00 00 00 00  00 66 2e 0f 1f 84 00 00  |.........f......|
00000090  00 00 00 66 2e 0f 1f 84  00 00 00 00 00 66 2e 0f  |...f.........f..|
000000a0  1f 84 00 00 00 00 00 66  2e 0f 1f 84 00 00 00 00  |.......f........|
000000b0  00 66 2e 0f 1f 84 00 00  00 00 00 66 2e 0f 1f 84  |.f.........f....|
000000c0  00 00 00 00 00 66 2e 0f  1f 84 00 00 00 00 00 66  |.....f.........f|
000000d0  2e 0f 1f 84 00 00 00 00  00 66 2e 0f 1f 84 00 00  |.........f......|
000000e0  00 00 00 66 2e 0f 1f 84  00 00 00 00 00 66 2e 0f  |...f.........f..|
000000f0  1f 84 00 00 00 00 00 66  2e 0f 1f 84 00 00 00 00  |.......f........|
00000100  00 66 2e 0f 1f 84 00 00  00 00 00 66 2e 0f 1f 84  |.f.........f....|
00000110  00 00 00 00 00 66 2e 0f  1f 84 00 00 00 00 00 66  |.....f.........f|
00000120  2e 0f 1f 84 00 00 00 00  00 66 2e 0f 1f 84 00 00  |.........f......|
00000130  00 00 00 66 2e 0f 1f 84  00 00 00 00 00 66 2e 0f  |...f.........f..|
00000140  1f 84 00 00 00 00 00 66  2e 0f 1f 84 00 00 00 00  |.......f........|
00000150  00 66 2e 0f 1f 84 00 00  00 00 00 66 2e 0f 1f 84  |.f.........f....|
00000160  00 00 00 00 00 66 2e 0f  1f 84 00 00 00 00 00 66  |.....f.........f|
00000170  2e 0f 1f 84 00 00 00 00  00 66 2e 0f 1f 84 00 00  |.........f......|
00000180  00 00 00 66 2e 0f 1f 84  00 00 00 00 00 66 2e 0f  |...f.........f..|
00000190  1f 84 00 00 00 00 00 66  2e 0f 1f 84 00 00 00 00  |.......f........|
000001a0  00 66 2e 0f 1f 84 00 00  00 00 00 66 2e 0f 1f 84  |.f.........f....|
000001b0  00 00 00 00 00 66 2e 0f  1f 84 00 00 00 00 00 66  |.....f.........f|
000001c0  2e 0f 1f 84 00 00 00 00  00 66 2e 0f 1f 84 00 00  |.........f......|
000001d0  00 00 00 66 2e 0f 1f 84  00 00 00 00 00 66 2e 0f  |...f.........f..|
000001e0  1f 84 00 00 00 00 00 66  2e 0f 1f 84 00 00 00 00  |.......f........|
000001f0  00 66 2e 0f 1f 84 00 00  00 00 00 0f 1f 00 55 aa  |.f............U.|
00000200

I don't understand the impact of . = 0x7C00 here? Is that info lost? (maybe not needed because the GRUB would anyway load it at 0x7C00)

Community
  • 1
  • 1
Naveen
  • 7,944
  • 12
  • 78
  • 165
  • 1
    That information is used during linking, while performing the symbol resolution. It is of course not explicitly stored in a raw binary as that has no headers and as such no place to store metadata. – Jester Jan 23 '20 at 15:45
  • I am confused because of "output section address" : http://www.scoberlin.de/content/media/http/informatik/gcc_docs/ld_3.html#SEC20 . I am likely wrong in my understanding that the "output section address" is used to define the load address which somehow is stored in the binary that is used by loader to load at the defined address. Please correct me in this understanding of "output section address" – Naveen Jan 23 '20 at 15:50
  • That applies to loaders that actually consult the binary which has this information stored inside it. Raw binary does not have this information, and the BIOS loads it at `0x7c00` automatically, as you said. The only reason it is needed is to perform the linking so that the code actually works at that address. – Jester Jan 23 '20 at 15:53
  • "Raw binary does not have this information", you mean the image generated with `oformat binary`? – Naveen Jan 23 '20 at 15:55
  • Yes, that is correct. – Jester Jan 23 '20 at 15:58
  • In case there is no load info in the raw binary, then above script with `. = 0x7c00` should also work, as in that case it will start writing text section from address 0 and then magic at 511,512 bytes. But it does not work. In this case I was expecting ld to dump first 512K like in the working case. What am I not understanding? – Naveen Jan 23 '20 at 16:05
  • Not sure what you mean. What doesn't work? – Jester Jan 23 '20 at 16:21
  • I mean it does not boot when I skip the line `. = 0x7c00`. What goes wrong in that case? – Naveen Jan 23 '20 at 16:59
  • As I said, that is needed to perform the linking so that the code is relocated to the proper load address. You did not show the code but presumably there is a mismatch otherwise. Real mode code can be written to be 0 based in which case you would not need the `. = 0x7c00` but apparently that is not true in your case. – Jester Jan 23 '20 at 17:07
  • 1
    `mov $msg, %si` use an absolute address, so it depends on the `org` aka `. = 0x7c00`. Your `jmp loop` instruction uses a `rel8` branch displacement so it doesn't. The other instructions don't involve addresses. BTW, [prefer `test %al,%al`, it's more efficient than `or %al,%al`](https://stackoverflow.com/questions/33721204/test-whether-a-register-is-zero-with-cmp-reg-0-vs-or-reg-reg/33724806#33724806). – Peter Cordes Jan 24 '20 at 01:11

1 Answers1

2
. = 0x7c00;
.text :
{
    __start = .;
    *(.text)
    /* Place the magic boot bytes at the end of the first 512 sector. */
    . = 0x1FE;
    SHORT(0xAA55)
}

0x7C00 you are telling the linker (this is not assembly language BTW, not related). that I want the next thing to be at address 0x7C00 in the address space (for the processor). with .text below it that means we want the .text code to be linked starting at address 0x7C00. So if there is anything position specific then it would be based off that address.

__start give me the address as of this point (within .text)

*(.text) put all the .text code here

. = 0x1FE move the pointer to 0x1FE within .text

SHORT(0xAA55) place these two bytes here at offset 0x1Fe and 0x1FF in .text

So assuming the code fits then this makes a 0x200 byte blob that is to be loaded at 0x7C00 in address space.

Now when you objcopy -O binary hello.elf hello.bin

the tool is going to look for the first loadable thing and the first portion of the output file is that first loadable thing. If this is the only thing you have in the "binary" then the 0x200 bytes will go to the file hello.bin.

The information that tells you that 0x7C00 is where this needs to be found by the processor, is lost in that -O binary file format. the elf had it others have it but that one doesn't.

Further if you had this 0x200 bytes at 0x7C00 and you had another 2 bytes at 0x8000 then the -O binary output would be 0x402 bytes long. The first 0x200 bytes would come from .text at 0x7C00 the lowest loadable thing, then 0x200 bytes of padding so that the next to bytes relative to the beginning of the file are in the right place, if you were to take hello.bin and put at 0x7c00 then those two bytes would be at 0x8000.

If you had these 0x200 at 0x7C00 and were to add another item to the linker script with 0x02 bytes at 0x7000 then hello.bin would start with those two bytes there would be 0xBFE bytes of padding then the 0x200 bytes of .text. so that when the bin file was loaded into memory at 0x7000 the two bytes and the 0x200 bytes are at the proper place.

So objcopy -O binary creates essentially a memory image of what needs to be loaded, sometimes with padding, but without information as to what the starting address is for that load. That you have to just know.

The elf file will contain the 0xAA55 as well in some form, I would assume the whole 0x200 bytes is one thing shown in .text, but perhaps it broke it into two items. Depends on the tool that created the elf as to which way and what the padding is.

halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    I've updated the question with the actual code used and the commands and tools he used to build with. They happened to be identical to the SO answer he linked to in the question. – Michael Petch Jan 24 '20 at 01:05
  • When I skip the line `. = 0x7c00` from the linker script, then text section should be linked at address 0. With `--oformat binary`, this first loadable should get picked up int the output image. I am getting the correct 512 bytes image, but instead of "hello world" it prints some garbage characters. I compared the hexdump of before and after removing this line, I can see that the first 3 bytes of the output image has changed. Why is this happening? And why I don't get proper "hello world"? – Naveen Jan 25 '20 at 10:01
  • because you linked for 0x0000 but it is loaded at 0x7C00 so the code is grabbing data or jumping to code that is not at the right address, do a diff of the binaries to see the bytes that are different, examine the disassembly of the two binaries at those locations and you should see that either a branch or a data access with a specific address is used, and in one case it is 0x7c00 based which is correct and the other 0x0000 based which is incorrect – old_timer Jan 25 '20 at 14:25
  • If I understand your problem the binary is loaded at 0x7c00 either way so you have to link for that. you are diving into things unrelated to assembly language btw, although quite valuable educationally. and while on that, x86 is the last instruction set you want to learn you want to learn some other, better, instruction set first. but this is a very interesting way to start despite it being x86...and i can respect this path – old_timer Jan 25 '20 at 14:27
  • 1
    I understand this now... I don't have good disassembly skills as of now, but with little search I could see that the %si register will be holding incorrect address if I compile it with 0-based load-address but later load at 0x7c00. Hence, the program does start executing, but when it comes to string printing, it prints garbage characters. – Naveen Jan 26 '20 at 03:41