0

In Linux the system call fork is said to use copy-on-write (COW) mechanism thus avoiding to actually duplicate memory if it's not really needed.

However, I've never seen an actual clear evidence of COW in action. I thought thus I build my own.

For this purpose I run a program that calculates the square of a matrix with 40,000 rows and columns, with each entry being a double (8 Bytes). This program forks into 8 processes in order to do the squaring. The actual values are populated at random. Both the initial matrix and the result are allocated using the system call mmap as rw and shareable.

I was hoping that I could see COW in action using top to track memory usage. What I get is the view below. This at best is an example "by contradiction": fork can't be possibly actually cloning and duplicating the memory of the initial matrix at the time of call, as the system does not have 95GiB of RAM, but only 16GiB.

enter image description here

(Incidentally this shows that it doesn't make much sense to add the values for different processes in any of the memory columns of top. I wasn't aware of this, which I'm afraid explains in part me failing to clearly see how the copy-on-write is taking place)

It doesn't seem such compelling an evidence to show COW in action. From an educational point of view, this is a weak example, as all processes "show" 11.9GiB use. It's not a compelling, clear example one could wish for. Instead you need to argue along the lines "ignores this, ignore that other value, then imagine...and you get were we wanted". Better would be some tool showing the actual, portion of independent physical memory each process is using.

Here independent means: if you add the memory usage you see for each process you get the actual, total physical memory the whole program is using.

Being top the wrong tool for that, what could one use instead?

EDIT: For example, would any of these tools help? If so, how to use them to get that info? valgrind, smem, /proc/pid/smap...

EDIT2: Digging further as hinted here , the best chance I've found so far is through smaps (or maybe pmap) as

( echo "Parent:" ; pid=266694 ; cat /proc/$pid/smaps | grep -A9 "/dev/zero" ; echo " "; for ch in cat /proc/$pid/task/$pid/children; do echo -e "\n###\nChild: $ch"; cat /proc/$ch/smaps| grep -A9 "/dev/zero"; done ) > out

where the parent process' id is 266694. The part of that output potentially useful here seems the following

Parent: 7fa5d9963000-7fa8d486b000 rw-s 00000000 00:01 18271562 /dev/zero (deleted)  
Size:           12500000 kB  
KernelPageSize:      4 kB  
MMUPageSize:           4 kB  
Rss:               35512 kB  
Pss:               35512 kB  
Shared_Clean:          0 kB  
Shared_Dirty:          0 kB 
Private_Clean:         0 kB  
Private_Dirty:     35512 kB
-- 7fa8d486b000-7fabcf773000 rw-s 00000000 00:01 18271561  /dev/zero (deleted)  
Size:           12500000 kB  
KernelPageSize:        4 kB  
MMUPageSize:           4 kB  
Rss:            12500000 kB  
Pss:             1562500 kB  
Shared_Clean:          0 kB  
Shared_Dirty:   12500000 kB 
Private_Clean:         0 kB  
Private_Dirty:         0 kB
    

\### Child: 266706 7fa5d9963000-7fa8d486b000 rw-s 00000000 00:01 18271562                   /dev/zero (deleted)  
Size:           12500000 kB  
KernelPageSize:        4 kB  
MMUPageSize:           4 kB 
Rss:               35872 kB  
Pss:               35872 kB  
Shared_Clean:          0 kB  
Shared_Dirty:          0 kB  
Private_Clean:         0 kB  
Private_Dirty:     35872 kB  
-- 7fa8d486b000-7fabcf773000 rw-s 00000000 00:01 18271561                   /dev/zero (deleted)  
Size:           12500000 kB  
KernelPageSize:        4 kB  
MMUPageSize:           4 kB  
Rss:            12500000 kB  
Pss:             1562500 kB  
Shared_Clean:          0 kB  
Shared_Dirty:   12500000 kB 
Private_Clean:         0 kB  
Private_Dirty:         0 kB

and the rest of the children showing roughly the same values. I'm not sure yet how to interpret this correctly.

However, I'm afraid my initial question has more to do with (*nix) memory management and the tools available for inspecting it -top and ps clearly useless here (see (ext)). And this is a completely different topic I know but an eps>0. Maybe I'd change the title or remove the question altogether...

MASL
  • 929
  • 1
  • 11
  • 25
  • If you mmap memory with `MAP_SHARED`, that memory is shared (not copied) on a fork. Only if you mmap with `MAP_PRIVATE` will the memory be copied (with COW) on fork, – Chris Dodd Oct 27 '20 at 02:30
  • I know that "theory". I'm looking for a way to actually showcase that indeed that's what happening. If I had to teach this topic, how can I do it so that the students don't just have to take my word for it? – MASL Oct 27 '20 at 02:45
  • It seems like your `top` snapshots show this quite clearly -- the 11.9GB used by the shared matrix is shared by all the processes. The total memory used is only a little higher -- the children need to copy a couple of stack pages and maybe a global page and that's it. – Chris Dodd Oct 27 '20 at 03:26
  • ah, figured that much after realizing what I mentioned about `top`: Memory values for different processes can't be added! Yet, from an educational point of view, this is a weak example, as all processes "show" 11.9GiB use. It's not a compelling, clear example one could wish for. Instead you need to argue along the lines "ignores this, ignore that other value, then image...and you get were we wanted". Better would be some tool showing the actual portion of physical memory each process is using. – MASL Oct 27 '20 at 04:32
  • You could fill some page of the process's memory with some distinctive data, fork a bunch of times, then search through `/dev/mem` and (hopefully) find only one physical page with those contents. Maybe `mlock()` it to make sure it isn't swapped out. – Nate Eldredge Oct 27 '20 at 04:44
  • Though looking at the man page for the `mem` device, you may have to configure your kernel without `CONFIG_STRICT_DEVMEM` to be able to do that. – Nate Eldredge Oct 27 '20 at 04:47
  • @NateEldredge Yikes, that's getting more complicating than I hoped for. Thanks for the pointer to /dev/mem. If I understand the man page, accessing it (programmatically) may not be safe? ("Examining and patching is likely to lead to unexpected results when read-only or write-only bits are present.") – MASL Oct 27 '20 at 04:54
  • Indirectly, I may be facing the same problem as the OP in [here](https://stackoverflow.com/questions/131303/how-can-i-measure-the-actual-memory-usage-of-an-application-or-process). Maybe the only way is profiling the program with something like Valgrind (`valgrind --tool=massif exec args`) ? – MASL Oct 27 '20 at 05:05

0 Answers0