48

It makes sense that something like an operating system would be written in C. But how much of it, and what kind of C? I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something. What about a call stack? The OS is responsible for setting up all of this stuff that other applications use, but how does it do that? When you want to open or create a file in C, the appropriate functions ask the operating system for that file. so... What kind of C is on the other side of that call? Or on the other end of a memory allocation?

Also, how much of an operating system would actually be written in C? All of it? What about architecture dependent code? What about the higher levels of abstraction--does that ever get written in higher level languages, like C++?

I mean, I'm just asking this out of sheer curiosity. I'm downloading the latest linux kernel now but it's taking forever. I'm not sure if I'll wind up being able to follow the code--or if I'll be caught in an inescapably complex web of stuff I've never seen before.

Carson Myers
  • 37,678
  • 39
  • 126
  • 176
  • Maybe this is of interest: [http://stackoverflow.com/questions/43180/how-to-get-started-in-operating-system-development](http://stackoverflow.com/questions/43180/how-to-get-started-in-operating-system-development) – bang Jul 08 '09 at 07:57
  • 1
    Voting to close as too broad or opinion based. If it were "What is OS XXX written in" it would be answerable. For any OS, any compiled language with `asm()` can be used. – Ciro Santilli OurBigBook.com Aug 15 '15 at 20:27
  • 1
    "active 6 years ago" :/ – Carson Myers Aug 15 '15 at 20:30

9 Answers9

48

Excellent questions, all. The answer is: little to none of the standard C library is available in the "dialect" of C used to write an operating system. In the Linux kernel, for example, the standard memory allocation functions malloc, nmalloc, free etc. are replaced with special kernel-internel memory allocation functions kmalloc and kfree, with special restrictions on their use. The operating system must provide its own "heap" -- in the Linux kernel, physical memory pages that have been allocated for kernel use must be non-pageable and often physically continguous. See This linux journal article on kmalloc and kfree. Similarly, the operating system kernel maintains its own special call stack, the use of which requires, from memory, special support from the GCC compiler.

Also, how much of an operating system would actually be written in C? All of it?

As far as I'm aware, operating systems are overwhelmingly written in C. Some architecture-specific features are coded in assembler, but usually very little to improve portability and maintainability: the Linux kernel has some assembler but tries to minimize it as much as possible.

What about architecture dependent code? What about the higher levels of abstraction--does that ever get written in higher level languages, like C++?

Usually the kernel will be written in pure C, but sometimes the higher level frameworks and APIs are written in a higher level language. For example, the Cocoa framework/API on MacOS is written in Objective C, and the BeOS higher level APIs were written in C++. Much of Microsoft's .NET framework was written in C#, with the "Common Language Runtime" written in a mix of C++ and assembler. The QT widget set most often used on Linux is written in C++. Of course, this introduces philosophical questions about what counts as "the operating system."

The Linux kernel is definitely worth looking at for this, although, it must be said, it is huge and intimidating for anyone to read from scratch.

cygil
  • 3,614
  • 1
  • 18
  • 10
  • 3
    When you say the kernel is usually written in C, this omits one of the key operating systems namely Windows, which is a hybrid C and C++ kernel; the executive (NT) part is written in C, but the Win32 part of the kernel (which mostly resides in win32k.sys and is responsible for fonts, windows, graphics, directX etc) is mainly written in C++. – SecurityMatt Dec 05 '12 at 21:37
34

What kind of C?

Mostly ANSI C, with a lot of time looking at the machine code it generates.

But, does an OS even have a heap?

Malloc asks the operating system for a pointer to some memory it is allowed to use. If a program running on an OS (user mode) tries to access memory it doesn't own, it will give a segmentation fault. An OS is allowed to directly access all the physical memory on the system, malloc not needed, no seg-faults on any address that exists.

What about a call stack?

The call stack actually often works at the hardware level, with a link register.

For file access, the OS needs access to a disk driver, which needs to know how to read the file system that's on the disk (there are a lot of different kinds) Sometimes the OS has one built in, but I think it's more common that the boot loader hands it one to start with, and it loads another (bigger) one. The disk driver has access to the hardware IO of the physical disk, and builds from that.

Michael Sofaer
  • 2,927
  • 24
  • 18
25

C is a very low level language, and you can do a lot of things directly. Any of the C library methods (like malloc, printf, crlscr etc) need to be implemented first, to invoke them from C (Have a look at libc concepts for example). I'll give an example below.

Let us see how the C library methods are implemented under the hood. We'll go with a clrscr example. When you implement such methods, you'll access system devices directly. For ex, for clrscr (clearing the screen) we know that the video memory is resident at 0xB8000. Hence, to write to screen or to clear it, we start by assigning a pointer to that location.

In video.c

void clrscr()
{

   unsigned char *vidmem = (unsigned char *)0xB8000;
   const long size = 80*25;
   long loop;

   for (loop=0; loop<size; loop++) {
      *vidmem++ = 0;
      *vidmem++ = 0xF;
   }
}

Let us write our mini kernel now. This will clear the screen when the control is handed over to our 'kernel' from the boot loader. In main.c

void main()
{
   clrscr();
   for(;;);
}

To compile our 'kernel', you might use gcc to compile it to a pure bin format.

gcc -ffreestanding -c main.c -o main.o
gcc -c video.c -o video.o
ld -e _main -Ttext 0x1000 -o kernel.o main.o video.o
ld -i -e _main -Ttext 0x1000 -o kernel.o main.o video.o
objcopy -R .note -R .comment -S -O binary kernel.o kernel.bin

If you noticed the ld parameters above, you see that we are specifying the default load location of your Kernel as 0x1000. Now, you need to create a boot loader. From your boot loader logic, you might want to pass control to your Kernel, like

jump 08h:01000h

You normally write your boot loader logic in Asm. Even before that, you may need to have a look at how a PC Boots - Click Here.

Better start with a tinier Operating system to explore. See this Roll Your Own OS Tutorial

http://www.acm.uiuc.edu/sigops/roll_your_own/

amazedsaint
  • 7,642
  • 7
  • 54
  • 83
5

But how much of it, and what kind of C?

Some parts must be written in assembly

I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something.

Some OS's have a heap. At a lowest level, they are slabs of memory that are dolled out called pages. Your C library then partitions with its own scheme in a variable sized manner with malloc. You should learn about virtual memory which is a common memory scheme in modern OS's.

When you want to open or create a file in C, the appropriate functions ask the operating system for that file. so... What kind of C is on the other side of that call?

You call into assembly routines that query hardware with instructions like IN and OUT. With raw memory access sometimes you have regions of memory that are dedicated to communicating to and from hardware. This is called DMA.

I'm not sure if I'll wind up being able to follow the code--or if I'll be caught in an inescapably complex web of stuff I've never seen before.

Yes you will. You should pick up a book on hardware and OS's first.

Unknown
  • 45,913
  • 27
  • 138
  • 182
2

I mean, in C, if you needed some heap memory, you would call malloc. But, does an OS even have a heap? As far as I know, malloc asks the operating system for memory and then adds it to a linked list, or binary tree, or something. What about a call stack?

A lot of what you say in your question is actually done by the runtime library in userspace.

All that OS needs to do is to load the program into memory and jump to it's entry point, most details after that can be done by the user space program. Heap and stack are just areas of the processes virtual memory. Stack is just a pointer register in the cpu.

Allocating physical memory is something that is done on the OS level. OS usually allocates fixed size pages, which are then mapped to a user space process.

John Smith
  • 4,402
  • 4
  • 32
  • 34
2

You should read the Linux Device Drivers 3. It explains pretty well the internals of the linux kernel.

fa.
  • 2,456
  • 16
  • 17
1

I wouldn't start reading the Linux kernel, It's too complicated for starters.

Osdev is an excellent place to start reading. I have done a little os with information from Osdev for an school subject. It runs on vmware, bochs, and qemu so it's easy to test it. Here is the source code.

Macarse
  • 91,829
  • 44
  • 175
  • 230
0

Traditionally, C is mostly needed for the kernel and device drivers due to interaction with hardware. However, languages such as C++ and Java could be used for the entire operating system

For more information, I've found Operating Systems Design and Implementation by Andrew Tannenbaum particularly useful with LOTS of code samples.

segfault
  • 5,759
  • 9
  • 45
  • 66
  • Explain how you will compile java with no concept of memory addresses to use hardware devices? How you would compile it down to machine code without an intermediate step to compile something to a native language? – Spence Jul 08 '09 at 08:45
  • Java can definitely be used for the User-level API -- but, for various reasons, including a lack of low-level constructs like "raw" memory pointers and its garbage-collected nature, it makes a poor choice for implementing the lowest level, the Java Virtual Machine. Sun's JVM is written, interestingly enough, in C++. Supposedly the https://maxine.dev.java.net/design.html maxine project uses pure java with a bit of assembler, and suggests that system programming is now possible with the features of Java 5, but I suspect some extensions and hackery were required. – cygil Jul 08 '09 at 10:17
  • Java is an entire ecosystem. When people talk about writing an OS in Java, they mean just the language syntax, not all the other parts. There is no JVM, no classloader, neither bytecode interpreter nor JIT compiler, no standard Java class library available to the OS. Instead, the Java source code will have to be compiled to machine code by a host system, and that host compiler will be responsible for things like forcing certain variables to certain addresses. It varies whether the entire OS is compiled to machine code or just the bootloader, bytecode interpreter, and hardware abstraction. – Ben Voigt May 24 '12 at 15:23
0

malloc and memory management functions aren't keywords in C. This is functions of standard OS libraries. I don't know the name of this standard (it is unlikely that it's POSIX standard - I haven't found any mention), but it's exists - you use malloc in C applications on most platforms.

If you want to know how Linux kernel works I advice this book http://oreilly.com/catalog/9780596005658/ . I think it's good explanation with some C code inserted :).

cetron
  • 23
  • 3
  • 1
    malloc is part of the ANSI C standard libraries that are required to ship as part of the C runtime library. Most OSes choose to ditch the entire CRL and build their own because there isn't a "behind" to allocate memory from. In Linux the heap allocation function is kalloc. In Windows it is ExAllocatePool. – SecurityMatt Dec 05 '12 at 21:34