0

From what I got here:

Memory fragmentation seems no longer an issue in 64-bit virtual address space, so why the garbage collector in some popular languages, V8 js, JVM, etc. still need to compact memory after mark-sweep to prevent heap fragmentation?

Duy Phan
  • 27
  • 5

1 Answers1

2

(V8 developer here.)

Address space fragmentation is not the same as VM heap fragmentation.

On 32-bit systems, it can be surprisingly unlikely to be able to allocate, say, a 256MB object even if 2-3 GB of memory are available in total, because existing objects are spread out all across the address space, so that no contiguous 256MB region can be found any more. That's a problem that 64-bit systems (usually) don't have any more.

VMs with garbage collectors usually organize their managed heap in "pages". As a mental model, you may assume that each page is 1MB in size. The VM can add pages to its heap (when it needs more) and give them back to the operating system (when they're empty). Now, it can happen that a page was heavily used, then most objects on it died, and now the entire 1MB page is only used for a single object that's just a few bytes in size. When an application went through a phase where it needed lots of memory (and hence many heap pages), and then that operation completed and most objects became unreachable, it can happen that most pages on the heap are mostly empty, only used by few/small objects each. That's a particular form of wasting memory: the VM needs to hold on to many heap pages, but the total size of all live objects is much smaller than the total size of all heap pages (which in turn is [part of] the amount of memory that the process is using from the operating system's point of view).
That's heap fragmentation. Whether you're running on a 32-bit or 64-bit system has nothing to do with it. And the way to avoid it is to have a "compacting" garbage collector, i.e. to have it move objects together so that some pages become entirely free and can be given back to the operating system.


Side note: 48 bits of address space (actually 47 bits, one bit is for the kernel) is not as impossible to exhaust as it seems at first. When applications (like virtual machines) have ideas like "oh, we have near-infinite address space, so let's reserve a 4GB 'cage' of address space around this thing, which would allow us to play some interesting performance tricks / create some interesting security guarantees / etc", and then some use case wants thousands of whatever that thing is, then you can run into address space limits before you expect it.

jmrk
  • 34,271
  • 7
  • 59
  • 74
  • Did you mean 4TB cage rather than 4GB? – Dada Aug 09 '23 at 15:54
  • @Dada, no, I meant 4GB, the amount of address space addressable with a 32-bit offset from the cage's base address. (For example, that's the technique that V8 relies on for "[pointer compression](https://v8.dev/blog/pointer-compression)".) – jmrk Aug 09 '23 at 18:06
  • All right. Then exhausting the 128TB of address space 4GB at a time is still going to take a while... edit: ah, I re-read your last paragraph, and I guess it was partially inspired by wasm multi-memory and the fact that V8 reserves 4GB per memory, which can lead to a fairly large amount of memory reserved when someone wants to use thousands of memories... – Dada Aug 09 '23 at 20:50
  • 1
    The HotSpot JVM also uses compressed pointers, either direct (4GB maximum) or shifted (typically 32GB maximum). So it’s also the case that exceeding this logical address space has performance drawbacks. Also, the CPU cache utilization would drop dramatically if we allowed such heavy fragmentation. And well, if all pages mapped to physical memory are in use, using one more implies swapping, regardless of how much logical address space is available. – Holger Aug 21 '23 at 09:56