How processes share virtual mem (Linux)

Question

I'm not sure if my question is a Linux one, or OS-agnostic.

If I had three processes running (let's call them P0, P1 and P2), and they appear to the user to be running concurrently, how do they share?

Do they each maintain their own stack, heap, etc. inside userspace?

Or do they simply own the entire stack, heap, etc. until the next process comes along and pre-empts it?

Take a look here: https://www.softprayog.in/programming/interprocess-communication-using-posix-shared-memory-in-linux — Ôrel, May 09 '17 at 07:07
Physical computer memory is used as in scenario A. However processes do not see physical addresses. Each process has its own virtual adress space and CPU translates virtual adress to physical at each memory access. From the point of vies of a process it looks like if it could use the whole address space, like the scenario B. See: http://stackoverflow.com/questions/22290347/understanding-virtual-address-virtual-memory-and-paging — Marian, May 09 '17 at 07:08

score 6 · Accepted Answer · answered May 09 '17 at 08:42

In Linux, and in most other currently used general purpose operating systems, memory is not a single linear array at all: the underlying physical memory is managed at the page level using virtual memory.

Essentially, the each process has their own virtual address space. Most of it is empty, unmapped -- and trying to access it leads to a segmentation fault or general protection violation, normally killing the process --; the process can only access memory that the kernel has explicitly set up to be accessible to the process.

In most cases, the processes cannot access the kernel memory directly, either. To execute a syscall -- for example, to open or read or write to a file or device --, the processor core essentially does a context switch to kernel mode, where the kernel data structures, and the memory used by the current process in userspace, are simultaneously accessible (but not necessarily at the same virtual addresses in the kernel space as in userspace).

This means that the memory accessible to each process is actually quite scattered and discontinuous nowadays:

    ╔════════╗   ╔════════╗   ╔═══════╗
    ║  Code  ║   ║  Data  ║   ║ Stack ║
    ╚════════╝   ╟────────╢   ╚═══════╝
    ╔════════╗   ║  BSS   ║
    ║ ROdata ║   ╟────────╢
    ╚════════╝   ║  Heap  ║
    ╔════════╗   ╚════════╝
    ║  Libs  ║
    ╚════════╝

If address space randomization is in use, the addresses of each of the above segments may vary even from one run to the next. Usually, the code (which is read-only and executable) and read-only data are loaded to fixed addresses, but the addresses of dynamically linked libraries, stack, and data vary.

There is also no reason why one of the above should have a higher or lower address than another, so I deliberately drew them next to each other, rather than in a single column!

Initialized data and uninitialized data are usually in a continuous segment, with only the initialized data part load from the executable file (data section). In Unix and POSIX-like systems, the heap follows the uninitialized data (and can be expanded using the brk() or sbrk() syscalls). In POSIXy systems like Linux, and indeed most other systems, a process can have additional "heaps" via (anonymous) memory maps, too.

The initial thread in the process also gets a separate stack segment. Any additional threads will get their own stacks, too.

(A typical exercise in learning to use POSIX threads is to find out how many concurrent threads a process can create. The typical result in Linux is only a hundred or a couple of hundred, and many learners find this very strange. The reason for such a low number is actually the default stack size, which is something like 8 megabytes on current GNU/Linux desktop distributions; the stacks alone for a hundred threads requires almost a gigabyte of memory, and so the number of concurrent threads is primarily limited by memory available for their stacks. A non-recursive thread worker function only needs a few dozen kilobytes of stack at most, and it only takes a few lines of code to explicitly set the stack size for newly created pthreads. Then, the maximum number of concurrent threads in a single process is typically on the order of a thousand or more, usually depending on process limits set by the system administrator or the distribution by default.)

As you can see in the diagram above, there is no "OS".

In fact, we really do need to split "Operating System" into two completely separate parts: the kernel (which provides the functionality implemented in system calls), and the libraries (which implement the non-system-call interfaces available to userspace processors, usually starting from the standard C libraries).

I only drew one "Libs" (for libraries) box above, but in practice, the code for each library tends to get their own separate segment of memory.

Let's look at a particular example in Linux (because that's what I'm using right now); the cat command. In Linux, the /sys and /proc filesystems are special pseudo-filesystem trees, that do not correspond to any files on any storage media at all, but are constructed by the kernel whenever they are accessed -- essentially, they are kernel-provided realtime views of data known by the kernel. The /proc/self subtree contains information on the "current process" -- that is, on whichever process it is that is examining that directory. (If more than one examines it at the same time, they each see their own data only, because this is not a normal filesystem, but kernel-created and provided on an as-needed basis.)

The /proc/self/maps (or /proc/PID/maps for a process whose process ID is PID) pseudo-file describes all the memory mappings the process has. If we run cat /proc/self/maps, we can see the mappings for the cat process itself. On my machine (a 64-bit Linux running on x86-64 architecture) it shows

00400000-0040c000 r-xp 00000000 08:05 2359392             /bin/cat
0060b000-0060c000 r--p 0000b000 08:05 2359392             /bin/cat
0060c000-0060d000 rw-p 0000c000 08:05 2359392             /bin/cat
0215f000-02180000 rw-p 00000000 00:00 0                   [heap]
7f735b70f000-7f735c237000 r--p 00000000 08:05 658950      /usr/lib/locale/locale-archive
7f735c237000-7f735c3f6000 r-xp 00000000 08:05 1179825     /lib/x86_64-linux-gnu/libc-2.23.so
7f735c3f6000-7f735c5f6000 ---p 001bf000 08:05 1179825     /lib/x86_64-linux-gnu/libc-2.23.so
7f735c5f6000-7f735c5fa000 r--p 001bf000 08:05 1179825     /lib/x86_64-linux-gnu/libc-2.23.so
7f735c5fa000-7f735c5fc000 rw-p 001c3000 08:05 1179825     /lib/x86_64-linux-gnu/libc-2.23.so
7f735c5fc000-7f735c600000 rw-p 00000000 00:00 0 
7f735c600000-7f735c626000 r-xp 00000000 08:05 1179826     /lib/x86_64-linux-gnu/ld-2.23.so
7f735c7fe000-7f735c823000 rw-p 00000000 00:00 0 
7f735c823000-7f735c825000 rw-p 00000000 00:00 0 
7f735c825000-7f735c826000 r--p 00025000 08:05 1179826     /lib/x86_64-linux-gnu/ld-2.23.so
7f735c826000-7f735c827000 rw-p 00026000 08:05 1179826     /lib/x86_64-linux-gnu/ld-2.23.so
7f735c827000-7f735c828000 rw-p 00000000 00:00 0 
7ffeea455000-7ffeea476000 rw-p 00000000 00:00 0           [stack]
7ffeea48b000-7ffeea48d000 r--p 00000000 00:00 0           [vvar]
7ffeea48d000-7ffeea48f000 r-xp 00000000 00:00 0           [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0   [vsyscall]

The first three ones are the code (r-xp), read-only data (r--p), and initialized data (rw-p) for the process itself. The data segment (or "heap") that the process can extend using sbrk() is the third one (that is, sbrk(0) would return 0x60d000.)

The process has some heap, proper, from address 0x215f000 up to (but not including) 0x2180000.

The next segment is a read-only mapping of the current locale data. The C library uses this for the locale-aware interfaces.

The next four segments are the C library proper: code (r-xp), a normally inaccessible mapping somehow used/needed by the C library (---p), read-only data (r--p), and initialized data (rw-p).

The next segment, and the other segments with no name in the last column, with protection mode (rw-p) are separate data segments or heaps.

The next three segments are the dynamic linker used in Linux, ld.so. Again, there is a code segment (r-xp), read-only data segment (r--p), and initialized data segment (rw-p).

The [stack] segment is the stack for the initial thread. (cat is single-threaded, so it only has one thread.) The [vvar] segment is provided by the kernel (to allow processes direct access to certain kernel-provided data without having to incur the overhead of a syscall). The [vdso] and [vsyscall] segments are provided by the kernel for accelerating syscalls that do not need the full context switch to complete.

So, as you can see, the complete picture is quite a bit more fragmented, but also free-er (as in more free-form), than old C and operating systems books would have you believe.

How processes share virtual mem (Linux)

1 Answers1