4

Is anonymous memory - i.e. program heap and stack - part of the page cache on Linux? The linked documentation of the kernel does not state that.

But the Wikipedia entry about Page Cache contains a graphic (look at the top right) which gives me the impression that malloc() allocates dynamic memory within the page cache:
linux storage stack diagram from thomas krenn

Does that make sense? Regarding mmap(), when it is used to access files it makes sense to use the page cache. Also generally for anonymous memory e.g. malloc() and anonymous mappings through mmap()?

I would appreciate some explanation.

Thank you.

Edit 2021-03-14
I've decided it is the best to ask the kernel maintainers of the memory subsystem on their mailing-list. Luckily Matthew Wilcox responded and helped me. Extract:

  • Anonymous memory is not handled by the page cache.
  • Anonymous pages are handled in a number of different ways -- they can be found on LRU lists (Least Recently Used) and they can be found through the page tables. Somewhat ad-hoc.
  • The wikipedia diagram is wrong. And it contains further flaws.
  • If a system provides swap and if anonymous memory is swapped - it enters the swap cache, not the page cache.

The discussion can be read on here or here.

trincot
  • 317,000
  • 35
  • 244
  • 286
Peter
  • 2,240
  • 3
  • 23
  • 37
  • Anonymous memory is backed by the swap area, not the filesystem. Other than that, it's essentially the same as any other virtual memory. – Barmar Mar 11 '21 at 17:20
  • I’m voting to close this question because this isn't a programming question. It seems more appropriate for [cs.se]. – Barmar Mar 11 '21 at 17:21
  • Hi! I tend to relate the question to the implementation in source code, this is why I'm asking here. My other candidate is [Unix & Linux](https://unix.meta.stackexchange.com/questions/314/unix-c-api-calls-ontopic) because I'm asking about Linux. Computer Science could be also possible saying it is about computer architecture? – Peter Mar 11 '21 at 17:52
  • Yes, that's what I'm saying. This is a general concept in operating system design, although some parts of it may be specific to Unix. – Barmar Mar 11 '21 at 19:13
  • I copied to [Computer Science](https://cs.stackexchange.com/q/136492/132961). Hope it is better placed there. Thanks! – Peter Mar 11 '21 at 20:04
  • I’ve copied it to Computer Science. – Peter Mar 11 '21 at 20:13
  • 4
    I respectfully disagree with Barmar. The question is specifically about Linux and a thorough answer would include Linux kernel programming materials/references. Linux tags don't even exist on Computer Science. Furthermore, there are only hundreds of followers in the linked Computer Science tags, but the Linux tag alone has 164k followers. Hopefully this question stays open as someone is more likely to answer here. – wxz Mar 11 '21 at 20:37
  • 1
    I know that overregularization is an issue. I myself bewildered by amount of similar sites here and overlapping scopes. In any case - I try to get an answer :) – Peter Mar 11 '21 at 20:47
  • 2
    CS has closed the question and suggests other sites, namely SO. I removed my own vote to close on SO and hope someone answers here. I hope in doubt questions are accepted on the broader site and strive for knowledge. – Peter Mar 11 '21 at 21:06

1 Answers1

3

TLDR: No, except for anonymous memory with special filesystem backing (like IPC shmem).


Update: Corrected answer to incorporate new info from the kernel mailing list discussion with OP.


The page cache originally was meant to be an OS-level region of memory for fast lookup of disk-backed files and in its original form was a buffer cache (meant to cache blocks from disk). The notion of a page cache came about later in 1995 after Linux's inception, but the premise was similar, just a new abstraction -- pages [1]. In fact, eventually the two caches became one: the page cache included the buffer cache, or rather, the buffer cache is the page cache [1, 2].

So what does go in the page cache? Aside from traditional disk-backed files, in an attempt to make the page cache as general purpose as possible, Linux has a few examples of page types that don't adhere to the traditional notion of disk-backed pages, yet are still stored in the page cache. Of course, as mentioned, the buffer cache (which is the same as the page cache) is used to store disk-backed blocks of data. Blocks aren't necessarily the same size as pages. In fact, I learned that they can be smaller than pages [pg.323 of 3]. In that case, pages considered part of the buffer cache might consist of multiple blocks corresponding to non-contiguous regions of memory on disk. I'm unclear whether, then, each page in the buffer cache must be a one-to-one mapping between a page and a file, or if one page can consist of blocks from different files. Nonetheless, this is one page cache usage that doesn't adhere to the strictest definition of the original page cache.

Next is the swap cache. As Barmar mentioned in the comments, anonymous (non-file backed pages) can be swapped out to disk. Along the way to disk and back, pages are put in the swap cache. The swap cache repurposes similar data structures as the page cache, specifically the address_space struct, albeit with swap flags set and a few other differences [pg. 731 of 4, 5] However, since the swap cache is considered separate from the page cache, anonymous pages in the swap cache are not considered to be "in the page cache."

Finally: the question about whether mmap/malloc are allocating memory in the page cache. As discussed in [5], typically, mmap uses memory that comes from the free page list, not the page cache (unless there were no free pages left, I assume). When using mmap to map files for reading and writing, these pages do end up residing within the page cache. However, for anonymous memory, mmap/mallocd pages do not normally reside within the page cache.

One exception to this is anonymous memory that has special filesystem backing. For instance, shared memory mmapd between processes for IPC is backed by the ram-based tmpfs [6]. This memory sits in the page cache, but is anonymous because it has no disk-backing file [pg. 600 of 4].

Sources:

  1. https://lwn.net/Articles/712467/
  2. https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics
  3. https://www.doc-developpement-durable.org/file/Projets-informatiques/cours-&-manuels-informatiques/Linux/Linux%20Kernel%20Development,%203rd%20Edition.pdf
  4. https://doc.lagout.org/operating%20system%20/linux/Understanding%20Linux%20Kernel.pdf
  5. https://lore.kernel.org/linux-mm/20210315000738.GR2577561@casper.infradead.org/
  6. https://github.com/torvalds/linux/blob/master/Documentation/filesystems/tmpfs.rst
wxz
  • 2,254
  • 1
  • 10
  • 31
  • 1
    This makes it sound like stack memory can't be swapped out - but surely it must be able to be. – Nate Eldredge Mar 12 '21 at 05:02
  • I think you're correct, in my mind I was thinking about non-swap page cache when I wrote about the stack, like in the case of mmap/malloc. – wxz Mar 12 '21 at 05:13
  • Please read the edit of my question. I'm afraid - regarding anonymous memory (stack and heap) your answer is not correct. I'm refraining of answering my own question here because Matthew Wilcox did that actually. – Peter Mar 14 '21 at 14:12
  • @Peter Thanks for trying to get an up-to-date answer from the kernel mailing list. I'm interested in exploring this further. I'm going to jump on the email chain if you don't mind. – wxz Mar 14 '21 at 22:10
  • I corrected my answer to include the new information. Feel free to suggest edits if I missed something so that we have a nice mostly up-to-date answer on SO. – wxz Mar 15 '21 at 03:32
  • 1
    Good work. We gained knowledge and meet helpful persons! I'm not familiar with Wikipedia internals but I removed the diagram from the entry for "page cache" and now looking about what do to best with the other entries which include the diagram. – Peter Mar 15 '21 at 15:12