4

Say I have a process in Linux from which I fork() another identical process. After forking, as the original process will start writing to memory, the Linux copy-on-write mechanism will give the process unique physical memory pages which are different from the one used by the forked process.

How can I, at some point of execution, know which pages of the original process have been copied-on-write?

I don't want to use SIGSEGV signal handler and give read only access to all the pages in the beginning as that induces an overhead I don't want.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
pythonic
  • 20,589
  • 43
  • 136
  • 219
  • 2
    im suspecting that this is done in a very deep level in the kernel – JosephH Apr 23 '12 at 16:13
  • 3
    [This](http://stackoverflow.com/questions/3060577/can-the-dirtiness-of-pages-of-a-mmap-be-found-from-userspace) should help you a bit. – jmkeyes Apr 23 '12 at 16:41
  • 1
    Ummm, I think _copied-on-write_ would sound better. – ninjalj Apr 23 '12 at 17:51
  • 5
    Why do you want to do that? Is it simply curiosity? Because if it isn't and you are planning to actually use this, then it sounds as if you need to reconsider your application design... – thkala Apr 23 '12 at 18:14
  • getrusage() will give you the rss and the numbers of blocks in/out. If you want to now *which* blocks were faulted in ("are present"), you are on your own, I think. – wildplasser Apr 24 '12 at 10:58

2 Answers2

3

Tracing the syscall - fork(), clone():

copy_process()->copy_mm()->dup_mm()->dup_mmap()-->and here you will find the algorithm to go through VMA by VMA to mark them as "copy-on-write":

http://www.cs.columbia.edu/~krj/os/lectures/L17-LinuxPaging.pdf

http://www.cs.columbia.edu/~junfeng/13fa-w4118/lectures/l20-adv-mm.pdf

Essentially (refer to slides), if PTE in unwriteable (which is hardware-based), and VMA is marked as writeable (which is software based) - that means the memory is copy-on-write.

You can always write a kernel module to do that.

Peter Teoh
  • 6,337
  • 4
  • 42
  • 58
  • Is there an existing mechanism for a user-space process to gather this information? It sounds ideal: no added overhead for anything except the actual query to find out which pages are still shared and which aren't. Err I guess probably no mechanism exists, since you suggest writing a kernel module for it. – Peter Cordes Apr 11 '16 at 05:43
  • 2
    Not sure at the moment: but potentially you can use /proc//pagemap (for the PTE) + /proc/vmallocinfo (for the VMA) to extract out the information. See Documentation/vm/pagemap.txt – Peter Teoh Apr 11 '16 at 06:46
0

You probably have to accept some overhead.

If you are privileged, you can pread /proc/self/pagemap (64 bits, at offset 8*(addr / PAGE_SIZE)) to get the PFN (it's the low 54 bits). Then look up that PFN in /proc/kpagecount to see if the page is shared.

If you don't have privilege, you can compare the PFN in the pagemap of the parent and child.

You can tell if any of the pages in the mapping are shared by comparing the Pss (proportional set size) with the total size in /proc/smaps.

Andy Lutomirski
  • 1,343
  • 12
  • 15