4

I am learning x86 assembly for my own curiosity to understand low-level stuff and came across this great repository herethat contains lots of examples that can be run from EFI shell.

When I check this hello world example, there is a linker script with these contents:

ENTRY(mystart)
SECTIONS
{
  . = 0x7c00;
  .text : {
    entry.o(.text)
    *(.text)
    *(.data)
    *(.rodata)
    __bss_start = .;
    /* COMMON vs BSS: https://stackoverflow.com/questions/16835716/bss-vs-common-what-goes-where */
    *(.bss)
    *(COMMON)
    __bss_end = .;
  }
  /* https://stackoverflow.com/questions/53584666/why-does-gnu-ld-include-a-section-that-does-not-appear-in-the-linker-script */
  .sig : AT(ADDR(.text) + 512 - 2)
  {
      SHORT(0xaa55);
  }
  /DISCARD/ : {
    *(.eh_frame)
  }
  __stack_bottom = .;
  . = . + 0x1000;
  __stack_top = .;
}

I am not able to understand why its exactly required? Just to specify the load address? My general understanding about linker scripts was that they are more useful when there are more than one object files, and the linker scripts can be used to define, how sections from multiple object files can be combined into the single executable.

What if I don't specify the linker script in this example? (there are definitely at least 2 object files - one resulting from .s and one from .c)

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Naveen
  • 7,944
  • 12
  • 78
  • 165
  • Couldn't you just try not specifying the linker script and see what happens? You could do that in less time than it took you to write your question here. – Ken White Jan 12 '20 at 04:26
  • Sorry for lesser clarity in my question. The point was not just to know the immediate outcome. It might be possible the I may run that without some linker script using some tricks/hacks. As the title says, I want to understand, what role does linker scripts play. I have seen linux kernel code where they start with assembly before entering the C code. It defines linker scripts. Is there a way to escape from that. To add, I have no prior experience with linker scripts. Its just theoretical definitions and reasons i have read. – Naveen Jan 12 '20 at 04:38
  • I recommend almost anything other than x86 as your first adventure into assembly language and bare metal. Cant use the its what I have excuse as for every x86 you have several ARM based items (you have more 8051s than x86s too). And even better start with a simulator/emulator as you cant crash/break it and you have far greater visibility into what is going on to get through the beginner issues. but if you insist a tutorial like that may get you started... – old_timer Jan 12 '20 at 04:40
  • learn to master the tools first, long before you start actually running code. so create/modify linker scripts. try command line options. arranging items on the command line in a different order. constantly examining the disasembly and other ways to view the result. If you cannot master the toolchain you will fail to boot a baremetal system as you have to place some of the items in specific places in the address space for the processor to boot properly. So you also need to know how your code will boot be it from a bios/efi bootloader or directly from the processor as well. – old_timer Jan 12 '20 at 04:43
  • @old_timer - Thanks. "start with a simulator/emulator....you have far greater visibility into what is going on to get through the beginner issues". Does emulator like qemu provide any debugging advantages? I was assuming it will give me the same black screen as baremetal, if something goes wrong... Can you please clarify this little more, so that I can explore more and prefer an emulator. – Naveen Jan 12 '20 at 04:45
  • depends on the target you choose, sometimes I simply write my own instruction set simulator. I cant remember one I examined pc emu or something there is bochs of course and some others, if you insist on x86 I would start with 8088/8086 first and then work toward 80386 to the present rather than diving in with the present. Really is not the first choice. pdp-11 is in some respects a very good first choice, but makes more sense in octal than hex, but simh is out there and works just fine as an environment. mips you can do spim, risc-v has a couple of well known simulators, etc. – old_timer Jan 12 '20 at 04:49
  • should probably try a few not just one. msp430 appears to be a direct decendent or heavily influenced by pdp-11 so has a lot of the good/clean features, but they are cisc. I wouldnt learn mips first on the risc side I would go with arm thumb first. countless $5 and $10 boards you can buy to play with after you figure out how to get started. risc-v may actually displace ARM in certain markets and that may happen as an explosion very soon, pretty easy to write your own simulator or find one for risc-v. but it is heavily influenced by MIPS so doesnt represent a typical instruction set. – old_timer Jan 12 '20 at 04:52
  • You should make this education fun so that is why it is on you to choose the path. Im trying to maximize your success and minimize failure. There are many traps, and not a lot of folks with the right experience and it is hard to reach out and find those folks even with a site like this. – old_timer Jan 12 '20 at 04:54
  • @old_timer : May we please move the discussion to chat https://chat.stackoverflow.com/rooms/205816/59700854 . I promise I will not take more than 10 minutes of yours. But this 10 minutes may definitely guide me well. – Naveen Jan 12 '20 at 04:57

1 Answers1

5

Note that that is a bare-metal example, meaning no operating system.

The gnu toolchain as installed on your computer was likely a build or built for that computer including operating system.

So when you apt-get install build-essential then gcc hello.c -o hello, the linker script used was part of the installed toolchain and was specific to Linux, your distro. (even if you build the toolchain and libc from sources it detects the host and if not being built as a cross compiler will make the stock bootstrap and linker script for that host the default)

When you find and install a gnu toolchain for windows the linker script buried in that install is specific to windows.

But when you want to use a toolchain as a cross compiler in this case for bare-metal you need to link for the target environment, which usually means bring along your own linker script, this one is over-complicated as usual, but at least they provided one.

Being x86 bare metal and using an x86 host for development you can (sometimes) use a native compiler as a cross compiler. Same for building for arm on an arm host (raspberry pi for example), etc.

Without the linker script when building something for cross compiling the default one will be used and if you have not customized the default one for your target then you will likely get a build that wont work.

The job of a linker script is primarily to define the address space to the linker. I want .text at this address I want .data at this address and so on. You can do this with the command line and without a linker script but it becomes simpler the more complicated you want to get and gnu ld has some issues (bugs) with command line vs linker script. Then the secondary reason is for specific languages you have a bootstrap, and some language assumptions need to be met in the bootstrap, but to facilitate that you need the address space portion of the linkers job in order to facilitate the linker script. You are letting the linker/tools do the work for you.

So for C it is assumed that .bss is zeroed and .data is filled with the items you asked for before the entry point to your code (generally main(), but in bare-metal you can do whatever want and often don't want to use that function name) is called. As a labor-saving device you use a linker to place all the items where you asked, so all the text all the bss and data and rodata, etc. It patches up external connections between the functions. But now the linker knows where and how big .bss is for example, how do you communicate that to the bootstrap code? Well gnu and other toolchains provide a mechanism (gnus solution is not expected to be portable to any other, assume all linker script languages are custom the toolchain and non-portable so you have to write new ones and a new bootstrap for each toolchain) for that. You can create variables in the linker script which the linker fills in whatever you as, starting address and ending address of .bss or you can do more math in the linker script and get starting address and size of .bss then you import that variable into the bootstrap assembly language code (cant use C that is a chicken and egg problem) and now the bootstrap can zero out .bss.

So I call this a marriage between the bootstrap code and the linker script, which are both toolchain specific for more than one reason, assembly language is defined by the assembler not the target so no reason to assume x86 assembly language for one toolchain (this has nothing to do with Intel vs AT&T) is compatible with another toolchains assembler, second the linker script language is also not assumed to be portable across toolchains and specific to that toolchain. So you are using languages specific to the toolchain and for C as an example you have tasks that you have to perform before calling any of the compiled code. The two or more files that make up linking and bootstrap are intimately connected.

Note that this example also has some bootstrap code included. I would look for a cleaner example real assembly vs inline, esp since there is an assembly language file in the project, the C part could have been demonstrating C instead of instead being a scripty inline assembly language thing. It does appear to link to a tutorial that explains what is going on so perhaps all of this is explained.

A beauty of bare-metal is that you can do whatever you want, you have less rules to live by, so this author has done that. I personally don't expect .bss to be zeroed and don't use .data so my non portable portions, the linker script and bootstrap are much much less complicated. You are welcome to your own style and preferences, the beauty of bare-metal programming.

halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • This answer and the comment is exactly I have been looking for. In fact I had few more questions which you magically predicted and answered all of them in one go. Thanks a lot. – Naveen Jan 12 '20 at 04:48