34

When say 3 programs (executables) are loaded into memory the layout might look something like this:

alt text http://img97.imageshack.us/img97/3460/processesm.jpg

I've following questions:

  1. Is the concept of Virtual Memory limited to user processes? Because, I am wondering where does the Operating System Kernel, Drivers live? How is its memory layout? I want to know more about kernel side memory. I know its operating system specific make your choice (windows/linux).

  2. Is the concept of Virtual Memory per process basis? I mean is it correct for me to say 4GB of process1 + 4GB of process2 + 4GB of process3 = 12GB of virtual memory (for all processes). This does't sound right. Or from a total of 4GB space 1GB is taken by kernel & rest 3GB is shared b/w all processes.

  3. They say, on a 32 bit machine in a 4GB address space. Half of it (or more recently 1GB) is occupied by kernel. I can see in this diagram that "Kernel Virtual memory" is occupying 0xc0000000 - 0xffffffff (= 1 GB). Are they talking about this? or is it something else? Just want to confirm.

  4. What exactly does the Kernel Virtual Memory of each of these processes contain? What is its layout?

  5. When we do IPC we talk about shared memory. I don't see any memory shared between these processes. Where does it live?

  6. Resources (files, registries in windows) are global to all processes. So, the resource/file handle table must be in some global space. Which area would that be in?

  7. Where can I know more about this kernel side stuff.

claws
  • 52,236
  • 58
  • 146
  • 195

3 Answers3

39
  1. When a system uses virtual memory, the kernel uses virtual memory as well. Windows will use the upper 2GB (or 1GB if you've specified the /3GB switch in the Windows bootloader) for its own use. This includes kernel code, data (or at least the data that is paged in -- that's right, Windows can page out portions of the kernel address space to the hard disk), and page tables.

  2. Each process has its own VM address space. When a process switch occurs, the page tables are typically swapped out with another process's page table. This is simple to do on an x86 processor - changing the page table base address in the CR3 control register will suffice. The entire 4GB address space is replaced by tables replacing a completely different 4GB address space. Having said that, typically there will be regions of address space that are shared between processes. Those regions are marked in the page tables with special flags that indicate to the processor that those areas do not need to be invalidated in the processor's translation lookaside buffer.

  3. As I mentioned earlier, the kernel's code, data, and the page tables themselves need to be located somewhere. This information is located in the kernel address space. It is possible that certain parts of the kernel's code, data, and page tables can themselves be swapped out to disk as needed. Some portions are deemed more critical than others and are never swapped out at all.

  4. See (3)

  5. It depends. User-mode shared memory is located in the user-mode address space. Parts of the kernel-mode address space might very well be shared between processes as well. For example, it would not be uncommon for the kernel's code to be shared between all processes in the system. Where that memory is located is not precise. I'm using arbitrary addresses here, but shared memory located at 0x100000 in one process might be located at 0x101000 inside another process. Two pages in different address spaces, at completely different addresses, can point to the same physical memory.

  6. I'm not sure what you mean here. Open file handles are not global to all processes. The file system stored on the hard disk is global to all processes. Under Windows, file handles are managed by the kernel, and the objects are stored in the kernel address space and managed by the kernel object manager.

  7. For Windows NT based systems, I'd recommend Windows Internals, 5ed by Mark Russinovich and David Solomon

Response to comment:

And now this 3GB is shared b/w all processes? or each process has 4GB space?

It depends on the OS. Some kernels (such as the L4 microkernel) use the same page table for multiple processes and separate the address spaces using segmentation. On Windows each process gets its own page tables. Remember that even though each process might get its own virtual address space, that doesn't mean that the physical memory is always different. For example, the image for kernel32.dll loaded in process A is shared with kernel32.dll in process B. Much of the kernel address space is also shared between processes.

Why does each process have kernel virtual memory?

The best way to think of this is to ask yourself, "How would a kernel work if it didn't execute using virtual memory?" In this hypothetical situation, every time your program caused a context switch into the kernel (let's say you made a system call), virtual memory would have to be disabled while the CPU was executing in kernel space. There's a cost to doing that and there's a cost to turning it back on when you switch back to user space.

Furthermore, let's suppose that the user program passed in a pointer to some data for its system call. This pointer is a virtual address. You've got virtual memory turned off, so that pointer needs to be translated to a physical address before the kernel can do anything with it. If you had virtual memory turned on, you'd get that for free thanks to the memory-management unit on the CPU. Instead you'd have to manually translate the addresses in software. There's all kinds of examples and scenarios that I could describe (some involving hardware, some involving page table maintenance, and so on) but the gist of it is that it's much easier to have a homogeneous memory management scheme. If user space is using virtual memory, it's going to be easier to write a kernel if you maintain that scheme in kernel space. At least that has been my experience.

there will be only one instnace of OS kernel right? then why each process has seperate kernel virtual space?

As I mentioned above, quite a bit of that address space will be shared across processes. There is per-process data that is in the kernel space that gets swapped out during a context switch between processes, but lots of it is shared because there is only one kernel.

Aaron Klotz
  • 11,287
  • 1
  • 28
  • 22
  • 1
    Thank you for response. But somehonw I'm still not clear. Especially about division of memory between kernel & user processes. Is it like total of 4GB space divided b/w kernel(1GB) and rest 3GB for user processes. And now this 3GB is shared b/w all processes? or each process has 4GB space? Why does each process have kernel virutal memory? there will be only one instace of OS kernel right? then why each process has seperate kernel virtual space. – claws Mar 15 '10 at 19:23
15

To answer your question, you need to understand more about Kernel and the techniques it employs to manage resources (CPU, memory, ...) and to provide elegant abstraction to the Application programs.

Firstly i want to make it clear that 'Virtual Memory' is a memory management technique employed by modern operating systems; which provides various benefits like process isolation, there by protection, allows multiple programs to run together, allows programs whose size is larger than the physical memory present in the system. Under this technique, again there are two terms 'Virtual Memory' and 'Virtual Address Space'; which are not same, but still closely related. (You would be wondering how is Virtual memory is both a technique as well as a concept under it, but yes that is correct and you will understand that below)

In computer science, the word 'memory' has 2 meanings. First one, is for something that you can use to store data (registers, cache, RAM, ROM, HDD, etc). Second one, is for synonymous with primary memory (i.e., RAM). When you replace word by word, 'Virtual Memory' is nothing but 'Virtual RAM'. That is the total amount of space available at all times in the system, in which the programs are loaded for execution. So this is nothing but Physical RAM memory + the swap memory on the secondary storage allocated by the kernel. So if you have 2GB of RAM and 4 GB of swap space set aside by kernel at installation time, then the Virtual memory of your system is 6GB. I am not going to explain more on the swap memory here, as this would deviate more from the topic.

Moving on to Virtual Address Space. So to understand this you need to tune your mind a bit. As the name itself "Virtual" says, the address space is not present in reality! This is just an illusion created by kernel to the application programmers (to achieve lot of benefits as i mentioned in paragraph 2) So each process is given a separate virtual address space by the kernel. (If there was no Kernel in the system and had you run your application program on the hardware then it would have used the physical address space, i.e., RAM as its address space) So on a machine with 32 bit address registers, the kernel could provide a virtual address space of 2^32 = 4GB for each process. (So this virtual address space range changes with the HW architecture. Latest processors have 48 bit address registers so they can provide a virtual address space of 2^48 = 256TB) And importantly this virtual address space is just in the air!! You would be thinking now, if it is just in the air, how can the code, data of the process be even executed. Yes, this need to be mapped to physical memory. How it is mapped with the physical memory is managed by kernel using the concept called paging. So now you can see how the kernel has achieved process isolation using virtual address space. So the address that each process can generate is between 0 to 4GB (assuming the system has 32 bit address register for simplicity sake), so which is within its entirety. And it knows nothing about any other process running in the system. So it is like each process is packed in a separate space.

So kernel code is also like another process/entity. So if kernel were to reside in an entirely different address space. Then there was no means for an application programs to interact with kernel. If application cant communicate with kernel and kernel cant communicate with application then there is no usefulness of the kernel to drive the system. So the question now is - How to make application processes to interact with the kernel? An option would be - If kernel code was present in the virtual address space of the application process then they could interact with each other. That's the reason why kernel code is present in each of the process' virtual address space because every process need to communicate with the kernel. Don't worry kernel code is not physically duplicated for each process. As i mentioned earlier VAS is just an illusion, so there will be just one copy of kernel code present in physical memory and it will be reference by all of the virtual address spaces (through paging). In case of linux, kernel would be placed on the upper address space between C000 0000 to FFFF FFFF (ie the reason 1GB is reserved for kernel in VAS) and rest 3GB (from 0000 0000 to BFFF FFFF) is allowed for the application program to use. The virtual address space where kernel resides is known as kernel space and where the application program resides is called user space. If you had carefully observed, then you would have come up with the question that if both application code and kernel code is residing in the same virtual address space, and since the kernel resides in a well pre-defined address location then is it not possible for the application code to corrupt the kernel code! Oops, at first it looks to be possible, but it can't. Reason being - this is protected using the help of HW. There will be flag on the processor which indicates whether the execution mode is SUPERVISOR MODE or USER MODE. Kernel space code should execute in SUPERVISOR MODE (setting that flag appropriately) and user space code should execute in USER MODE. So if you are in USER MODE and tries to access/modify code in kernel space then an exception is thrown! (processor gets to know it based on the address the instruction is trying to access. If it is higher than C000 0000 then it can easy detect it is trying to access kernel space code and the current execution mode doesn't have appropriate permission, since the flag is set with USER MODE permission). Just a note: In SUPERVISOR mode, processor provides access to additional set of instruction set.

I hope if you understand this concept you could answer yourself for your question. I have answered directly for many of your questions while explaining the concept itself.

Darshan L
  • 824
  • 8
  • 30
  • 1
    "How to make application processes to interact with the kernel?" I thought it would be possible via some syscalls. Still don't get why kernel has to be part of application virtual space (even it's not actually duplicated in physical address space). – mauron85 Apr 08 '18 at 13:58
  • Also it seems not all OSes are doing kernel to user space mapping: "Mac OS X does not map the kernel into each user address space, and therefore each user/kernel transition (in either direction) requires an address space switch." https://flylib.com/books/en/3.126.1.91/1/ – mauron85 Apr 08 '18 at 14:09
  • 1
    @mauron85: "How to make application processes to interact with the kernel?" - Yes, you are right, the only way is through **System Calls**. "Still don't get why kernel has to be part of application virtual space" - The processor, is a dumb unit (with all due respect), whatever the instruction you feed it, it executes, it doesn't have intelligence on its own. The only way you can tell the processor to execute kernel code is to provide an address where the kernel code is residing (by placing the next inst address in **'Program Counter'** register). – Darshan L Apr 25 '18 at 12:38
  • And the address that goes into the program counter is a virtual address, so to pass the control to kernel code, your kernel code should be somewhere in that virtual address space itself. Simple! [However, to execute kernel code, your processor should be running with a different privilege, kernel mode previlege. And i guess, _that is what the system call would do; change the privilege from user mode to kernel mode_] – Darshan L Apr 25 '18 at 12:52
  • All my understanding is wrt to linux kernel, I am not sure, how Mac OS X does it. May be, whatever linux does (which is what i explained), is one way of achieving the thing. – Darshan L Apr 25 '18 at 12:56
  • 1
    Additional information, **"How do you restrict application code to directly access kernel code?"** There are hardware support to achieve this. Processors could run in multiple privileged mode. I guess x86 architecture supports 4 modes; however only 2 of them are utilized by linux OS. User mode and kernel mode. And you could program the hardware, such that a particular portion of the (virtual) address space should be executed when it is in a particular privilege mode only. That's the reason, why in linux by default, in 4GB v. address space, upper 1GB is reserved for kernel... _cont_ – Darshan L Apr 25 '18 at 13:11
  • and if user program directly tries to access the code present between 0 - 1GB address space (reserved for kernel), then there will be an exception raised by the processor. – Darshan L Apr 25 '18 at 13:15
  • 1
    @mauron85: Yes, it is via syscalls. e.g. if user-space makes a `write(1, buf, 1024)` system-call, control transfers to kernel code in kernel mode (but without any changes to the page tables). The kernel code will then need to copy 1024 bytes from the user-space buffer into somewhere appropriate in kernel space. In Linux, that means calling `copy_from_user` which does some permission checking, but does ultimately use the user-space pointer passed to the system call as the source address for what's basically a memcpy inside the kernel. (e.g. to the pagecache if writing to a file). – Peter Cordes Mar 15 '22 at 15:55
  • 1
    @mauron85: In user mode, the CPU only allows read / write of user pages. But in kernel mode, the CPU allows read / write of both kernel and user addresses. So really most of kernel space only needs to be mapped when the kernel has control. (And to mitigate the Meltdown vulnerability, this is in fact what modern kernels do, only keeping key parts of kernel space mapped, like the interrupt descriptor table and a small "trampoline" chunk of code that changes the page tables before jumping into the kernel proper, so most of it can be unmapped while user-space is running.) – Peter Cordes Mar 15 '22 at 16:00
  • @PeterCordes I still don't quite get why kernel space needs to be part of virtual address space of a process. If user program wants to do some privileged instructions, it can request kernel's service via sys call. During this process, I don't see anything particular that requires kernel code/data and user process to share the same address space – torez233 Jul 05 '22 at 01:00
  • 1
    @torez233: Before Meltdown / Spectre vulnerabilities, there was no need to change page tables when switching from user to kernel mode (e.g. for an interrupt or system call), so at least for efficiency you always wanted kernel space mapped in the current page tables, but with a permission bit that only allowed access if the CPU was running in kernel mode. Even now, at least some kernel code/data needs to be mapped to valid virtual addresses while user-space is running, namely the interrupt table that the CPU reads, and code for interrupt handlers. (At least stubs that switch page tables). – Peter Cordes Jul 05 '22 at 01:09
  • @torez233: Kernel code does frequently need access to user-space, though, so it still needs room to keep user-space mapped. Many system calls pass a user-space pointer to the kernel, which it needs to be able to dereference. (e.g. `read`, `write`, `open` (for the filename), `getdents`, `sigprocmask`, `nanosleep`, `futex`, etc.) Unless you want the kernel to manually walk the page tables to translate those to physical addresses, then use a kernel virtual address in the direct-mapped region to access that memory, accounting for page boundaries that a pathname might be split across... – Peter Cordes Jul 05 '22 at 01:14
  • @PeterCordes Thanks for this details explanation. One question on "at least some kernel code/data needs to be mapped to valid virtual addresses while user-space is running, namely the interrupt table that the CPU reads, and code for interrupt handlers": is it because if we don't have interrupt handlers mounted in current running process's address space, when interrupt occurs (e.g. a timer interrupt), processor won't be able know which code to execute next? – torez233 Jul 05 '22 at 03:35
  • 1
    @torez233: Yes, exactly. On x86, the CPU knows the *virtual* address of the [IDT](https://wiki.osdev.org/Interrupt_Descriptor_Table) and [GDT](https://wiki.osdev.org/Global_Descriptor_Table), both of which it needs to read to find the virtual address of the interrupt handler. Then it needs to fetch kernel code from that virtual address, all with the same page-tables that user-space was using. (Also needs to access the TSS to change RSP to point at the kernel stack, but it *doesn't* change page tables for you. The design intent was to only change page tables on context-switch to another task) – Peter Cordes Jul 05 '22 at 03:47
  • @PeterCordes "the CPU knows the virtual address of the IDT and GDT" If what CPU remembers about trap tables is their _virtual_ addresses then it makes sense to me those structures need to be mounted into current process's address space as virtual address is only meaning in the context of the same process's address space. But upon context switch, kernel has to update these addresses CPU remembers by pushing/popping them into PCB/kernel stack? If so, why not just let CPU remember the actual physical address of trap table, as it won't vary across processes? – torez233 Jul 05 '22 at 04:17
  • @torez233: The virtual addresses don't vary either; the intended design is essentially what OSes actually do: part of virtual address space is reserved by the kernel, and mapped to the *same* physical pages in the page tables for every task. (x86 even has a "global" bit in the PTE, to tell the CPU it can keep these mappings cached in the TLB across a `mov` to CR3 changing the top-level page-table pointer.) – Peter Cordes Jul 05 '22 at 04:33
  • @torez233: What you suggest would work as a design, but happens not to be what was chosen. Generally everything uses virtual addresses, except by necessity the page table internals themselves. Having x86 `lidt` / `lgdt` only remember the physical address would mean that `sidt` (to ask the CPU for the address) wouldn't give you an address you could use for data load/store. And you'd still need some kernel code/data at fixed virtual addresses so the CS:RIP loaded from the IDT can be used, so there's no getting around it. – Peter Cordes Jul 05 '22 at 04:39
  • Another semi-related design reason is that x86's LDT (local descriptor table) could be in per-process memory, so changing page tables on context-switch changes LDT without having to re-run an `lldt` instruction. The CPU probably uses the same internals for reading a GDT or LDT entry, since `mov` to a segment register can use one or the other depending on a bit in the selector. So it would be inconvenient for one to be physical and another to be virtual. – Peter Cordes Jul 05 '22 at 04:40
  • (Technically, you could have sets of page tables with different IDT and GDT, if you wanted to get creative with an OS. Perhaps as a primitive kind of semi-virtualization without full HW virtualization.) – Peter Cordes Jul 05 '22 at 09:26
4

cited from csapp 3rd chapter 9:

enter image description here

The kernel virtual memory contains the code and data structures in the kernel. Some regions of the kernel virtual memory are mapped to physical pages that are shared by all processes. For example, each process shares the kernel’s code and global data structures. Interestingly, Linux also maps a set of contiguous virtual pages (equal in size to the total amount of DRAM in the system) to the corresponding set of contiguous physical pages. This provides the kernel with a convenient way to access any specific location in physical memory—for example, when it needs to access page tables or to perform memory-mapped I/O operations on devices that are mapped to particular physical memory locations.

Other regions of kernel virtual memory contain data that differ for each process. Examples include page tables, the stack that the kernel uses when it is executing code in the context of the process, and various data structures that keep track of the current organization of the virtual address space.

enter image description here

example data structure for each process task_struct.

Process context that is needed for context switch is also stored for each process.

It consists of the values of objects such as the general-purpose registers, the floating-point registers, the program counter, user’s stack, status registers, kernel’s stack, and various kernel data structures such as a page table that characterizes the address space, a process table that contains information about the current process (PIDs), and a file table that contains information about the files that the process has opened.

Signal bit vectors are also stored as well.


related anwser

  1. Does each process have its own kernel stack?

Not just each process - each thread has its own kernel stack (and, in fact, its own user stack as well). Remember the only difference between processes and threads (to Linux) is the fact that multiple threads can share an address space (forming a process).

Izana
  • 2,537
  • 27
  • 33
  • The kernel part of virtual address space is documented in the Linux source. e.g. for x86-64, https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt – Peter Cordes Mar 15 '22 at 15:48