Bare bones OS kernel programming

Question

I have recently started to take an interest in the topics of operating systems. I have a couple of things that are weighing on my mind, but I have decided to split the questions.

Let's assume we're designing a kernel for a new instruction set architecture that's out on the market. There are no C runtime libraries, no nothing. Only a compatible compiler for that ISA.

Presumably, this means that the only C constructs that are available to the kernel programmer are only basic assignment operators, bitwise operators and loops. Is this correct?

If so, how are more complex things like main memory I/O and process scheduling achieved on the lowest level? Can they only be implemented in pure assembly?

What does it mean then, for a kernel to be written in C (Linux for example). Are some parts of the kernel inherently written in assembly then?

Runtime libraries are compiled by "compatible compiler for that ISA". So what is the problem? Memory I/O is achieved by low level code written for the specific architecture using the same ""compatible compiler". Process scheduling is a high-level task, so I won't even mention it. — Eugene Sh., Mar 06 '15 at 20:14
Please bear in mind that C was originally written to facilitate the construction of OS. — Weather Vane, Mar 06 '15 at 20:26
Go to kernel.org and look at the source for yourself. Almost all of it is in C--even device drivers and other very low-level stuff, but yes, there are bits of assembly for things like switching processor modes and such. — Lee Daniel Crocker, Mar 06 '15 at 20:26
I find the third paragraph baffling. It's like you're insinuating that bitwise operations exist but arithmetic operations don't? Loops yes, but no switch statements or functions? Where is that coming from? — Kerrek SB, Mar 07 '15 at 00:13
@Kerrek Those are also "basic" constructs that would be available, I was just naming examples. What I meant was anything that could be implemented trivially with assembly using simple conditional branches and jumps. — Mark, Mar 07 '15 at 05:55

tux3 · Accepted Answer · 2016-01-28T13:33:31.253

Presumably, this means the only C constructs that are available to the kernel programmer are only basic assignment operators, bitwise operators and loops. Is this correct?

Pretty much all C language features will still work in your kernel without needing any particular runtime support, your C compiler will be able to translate them to assembler that can run just as well in kernel mode as they would in a normal user-mode program.

However libraries such as the Standard C Library will not be available, you will have to write your own implementation. In particular this means no malloc and free until you implement them yourself.

If so, how are more complex things like main memory I/O and process scheduling achieved on the lowest level? Can they only be implemented in pure assembly?

Memory I/O is something much more low level that is handled by the CPU, BIOS, and various other hardware on your computer. The OS thankfully doesn't have to bother with this (with some exceptions, such as some addresses being reserved, and some memory management features).

Process scheduling is a concept that doesn't really exist at the machine code level on most architecture. x86 does have a concept of tasks and hardware task switching but nobody uses it. This is an abstraction set up by the OS as needed, you would have to implement it yourself, or you could decide to have a single-tasking OS if you do not want to spend the effort, it will still work.

What does it mean then, for a kernel to be written in C (linux for example). Are some parts of the kernel inherently written in assembly then?

Some parts of the kernel will be heavily architecture dependent and will have to be written in ASM. For example on x86 switching between modes (e.g. to run 16 bit code, or as part of the boot process) or interrupt handling can only be done with some protected ASM instructions. The reference manual of your architecture of choice, such as the Intel® 64 and IA-32 Architectures Software Developer’s Manual for x86 are the first place to look for those kinds of details.

But C is a portable language, it has no need for such low level architecture-specific concepts (although you could in theory do everything from a .c file with compiler intrinsics and inline ASM). It is more useful to abstract this away in assembler routines, and build your C code on top of a clean interface that you could maintain if you wanted to port your OS to another architecture.

If you are interested in the subject, I highly recommend you pay a visit to the OS Development Wiki, it's a great source of information about Operating Systems and you'll find many hobbyists that share your interest.

And how would one implement malloc? Is it just a matter of pointer bookkeeping? — Mark, Mar 06 '15 at 21:06
@Mark Basically, yes. One straightforward implementation is to keep a compact linked list of chunks of memory used. Each chunk has a header telling you if it's used, it's size (so you can find the next one), and maybe a magic number to detect memory corruptions. When someone calls `malloc`, you find a free chunk of the appropriate size, mark it used, and return a pointer to it. When someone calls `free`, you go in that chunk's header and mark it as not used. Of course there are many more possible implementations, and you can try implementing it in user-mode too. — tux3, Mar 06 '15 at 21:09
Thanks for your insightful answers, everyone else's as well. Just one more thing... I realise process scheduling is done at a high level, I probably should've asked about context switching in particular. Is it an ASM only task? Also, Im not a hundred percent sure on how the OS uses RAM. Does it invoke the mentioned low level procedures everytime it needs to store something in memory? How are those procedures made available in the first place? This is the last piece that is just not coming together... — Mark, Mar 06 '15 at 22:08
@Mark Scheduling has a critical part (the actual context switch) that must be done in ASM. On x86 it's usually done by using the IRET instruction creatively. The OS has pretty much free reign over the RAM (there are notable exceptions), that is, you can just do something like `*(char*)(0xB8000)=0x41` and it would work (in this case assuming the hardware is setup right it'd print a 'A' on the screen.). Take an arbitrary address and if it corresponds to actual RAM, you can just read/write to it. There's nobody to tell you your pointers aren't valid when you're the one that manages the memory. — tux3, Mar 06 '15 at 22:15

Ira Baxter · Answer 2 · 2015-03-06T20:38:13.740

About the only thing you need to code in assembler are:

Context switches (swapping out the machine state of one abstract process for another)
Access to device registers (and you don't even need this if the devices are memory mapped)
Entry and exit from interrupt handlers (this is a kind of context switch)
Perhaps a boot loader

Everthing else you should be able to do in C code.

If you want to see this job done spectacularly well, you should go an check out the Multics OS, dating from the middle 60s, supporting a large scale information services (multiple CPUs, Virtual Memory, ...). This was coded almost entirely in PL/1 (a C-like language) with only very small bits coded in the native assembly language of the Honeywell processor that supported Multics. The Organick book on Multics is worth its weight in gold in terms of showing how Multics worked and how clean most of it is. (We got "Eunuchs" instead).

There are some places where it will be worthwhile to code in assembler anyway. Regardless of the quality of your compiler's code generator, you will be able to hand-code certain routines that occur in time-critical areas better in assembler than the compiler will do. Places I'd expect this matter: the scheduler, system call entry and exit. Other places only as measurement indicates. (On older, much smaller systems, one tended to write the OS using a lot of assembler, but that was as much for space savings as it was for efficiency of execution, C compilers weren't nearly as good).

I'd add to this list CPU mode switching, setting up of architecture dependent data structures (think lidt/lgdt/lldt/ltss/... on x86), access to debug registers, performance counting, low-level hardware management through CPU port I/O, etc etc. There are actually quite a lot of things that are rarely used by that you couldn't really do without in a serious OS. — tux3, Mar 06 '15 at 20:32
I've built many OSes in my day, am very familiar with duking it out with the hardware. "Mode switching" I would include a kind of "context switching", that involves some extra I/O port configuration. All that other stuff is just I/O. — Ira Baxter, Mar 06 '15 at 20:32
I meant [mode switching](https://i.imgur.com/dPYsATF.png) as in going to e.g. Virtual 8086 mode, or switching to long mode which requires writing to EFLAGS or some MSRs. No port I/O required, and not really switching to another task/context which would be done e.g. through an IRET instruction. But I get your point. — tux3, Mar 06 '15 at 20:35
Agreed, that kind of mode switching has to be done with assembly code; you can't "say it" in a "higher-level" (cough) language like C. — Ira Baxter, Mar 06 '15 at 20:39

rcgldr · Answer 3 · 2015-03-07T22:54:03.843

I'm wondering how a new architecture that's "out on the market" would not already have some type of operating system.

Device drivers - someone is going to have to write code for this, perhaps one driver for BIOS, the other for the OS. Memory mapped I/O can get complicated depending on the hardware, such as a controller with a set of descriptors, each containing a physical address and length. If the OS supports virtual memory, then that memory has to be "locked" and the physical addresses obtained in order to program the controller. This one reason for having a set of descriptors, so that a single memory mapped I/O can handle scattered physical pages that have been mapped into a continuous virtual address space.

Assembly code - the other comments here have already note that some assembly will be required (context switches, interrupt handlers (which could call C functions, so most of the code could be in C)).

Bare bones OS kernel programming

3 Answers3