39

For two processes A and B, the both use the library libc.so, libc.so is loaded into memory only once. This is a normal situation when A and B both run on the same host and the same rootfs.

When it comes to container, if A and B are running in different containers, are A and B sharing same memory area?

for example

imageA

--libc.so

--programA

imageB

--libc.so

--programB

we use chroot to run A and B in different rootfs. The two libc.so are same. Will libc.so be loaded into memory twice?

Xinli Niu
  • 471
  • 1
  • 4
  • 6
  • 1
    you can run a container holding only `libc.so` and then link A and B container to it. Or use volumes to share it. – Yuriy Kravets Mar 08 '16 at 10:41
  • I know this would share the library. But I wander if A and B will share the same ram area if they are running in different containers. – Xinli Niu Mar 08 '16 at 10:46

2 Answers2

26

Actually, processes A & B that use a shared library libc.so can share the same memory. Somewhat un-intuitively it depends on which docker storage driver you're using. If you use a storage driver that can expose the shared library files as originating from the same device/inode when they reside in the same docker layer then they will share the same virtual memory cache pages. When using the aufs, overlay or overlay2 storage drivers then your shared libraries will share memory but when using any of the other storage drivers they will not.

I'm not sure why this detail isn't made more explicitly obvious in the Docker documentation. Or maybe it is but I've just missed it. It seems like a key differentiater if you're trying to run dense containers.

Yeroc
  • 1,054
  • 14
  • 21
  • 5
    This means that - containers using the same image or base image (assuming no intermediate image has modified the image) will share memory - containers using the same libraries path but different images won't share memory (because images aka layers are different files on disk) – untore Mar 24 '17 at 17:24
  • @Yeroc Very interesting answer, thank you for this. Do you have any "evidence" or research that proves that this is really the way it is? – Per Lundberg Apr 26 '18 at 10:02
  • 8
    @PerLundberg Yes, at the time I did quite a bit of research to verify this. This involved understanding how page caching works on Linux plus how files are exposed by the storage driver to the kernel. It looks like the documentation has improved somewhat. At least for the Btrfs & OverlayFS drivers interaction with the page cache is now explicitly mentioned under the performance section of the documentation. For AUFS it simply says"uses the page cache very efficiently." For ZFS it suggests some caching is available but it's unclear if it's the page cache. – Yeroc Apr 26 '18 at 23:29
  • 1
    Also, you will likely find some people that talk about KSM (Kernel Same-page Merging) which is another way to achieve the same goal of sharing / deduplicating memory pages albeit at a much higher cost since this is operating up at the page cache level so has no knowledge on whether two different pages in memory are actually sourced from the same bytes on disk. – Yeroc Apr 26 '18 at 23:33
  • 1
    @untore It's really impressive that Docker avoids memory duplication for base images, where did you find this information? I'd like to read more about it. Is there a memory or speed penalty for something like this? Or does Docker just start with the lowest level container and allocate each inherited container afterward so all containers that need (e.g.) glibc have it in the same place? – jrh Nov 28 '18 at 21:00
  • Just found this answer on [How is Docker different from a virtual machine?](https://stackoverflow.com/a/16048358/4975230), it would appear that some of this functionality comes with the filesystem Docker uses (aufs). – jrh Nov 28 '18 at 21:51
  • 1
    @jrh I wouldn't exactly use the word "impressive"; this is the default state of affairs and is all handled by the kernel. I don't believe Docker and its storage drivers do anything particular (or at all) in this regard. You should look at it the other way around: because of the way *some* storage drivers work (e.g. block-level drivers), identical files in shared layers will not appear to be from the same base filesystem to the kernel and so the kernel will not be able to reuse pages. For other drivers, it all works fine. Remember "containers" are just normal processes; no performance penalty. – tne Jul 10 '19 at 02:33
9

You can figure out if the .so's in different containers are sharing the same physical memory by comparing the physical addresses of processes (/proc/pid/pagemap) from two different containers as seen on the host.

# ps -ef | grep java | grep -v grep | awk '{ print $2 }'
3906
4018
# sudo pmap -X 3906 | grep -e "Address" -e "libpthread"
     Address Perm   Offset Device     Inode    Size   Rss   Pss Referenced Anonymous LazyFree ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked Mapping
7f97d9249000 r-xp 00000000  fd:00 135202206     104   104    52        104         0        0              0              0               0    0       0      0 libpthread-2.27.so
# sudo pmap -X 4018 | grep -e "Address" -e "libpthread"
     Address Perm   Offset Device     Inode    Size   Rss   Pss Referenced Anonymous LazyFree ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked Mapping
7fce739e1000 r-xp 00000000  fd:00 135202206     104   104    52        104         0        0              0              0               0    0       0      0 libpthread-2.27.so
# virt_to_phys_user 3906 0x7f97d9249000
0x59940000
# virt_to_phys_user 4018 0x7fce739e1000
0x59940000

Here 3906 and 4018 are process id's on the host of two instances of a java application running in two different containers. I used virt_to_phys_user which is a simple 'c' program to dump the physical memory given a pid and a virtual memory from this link. Notice that the physical address is the same for both the processes above. Also note that both instances have the same inode addr and Pss indicates that these pages are being shared.

However as the earlier answer mentioned, this behaviour is dependent on the storage driver used. I see that this works on docker-ce on Ubuntu 18.04 and podman on RHEL8 (overlay2 and overlay fs respectively), but it didn't work on RHEL 7.5 with devicemapper.

dino
  • 181
  • 1
  • 4