6

I am dealing with huge amounts of data in Rust and while evaluating the memory usage of data structures, I stumbled upon some surprising results.

First, I tried filling a vector manually by just pushing zeroes to it:

let mut arr = vec![];

for _ in 0..(4_294_967_294 as u32) {
    arr.push(0);
}

After a while, quite expectedly I would say, my computer ran out of available memory and the process was killed by the OS.

However, if I initialise the vector using the macro initalisation, the behaviour changes:

let mut arr = vec![0; 2_147_483_647_000];

for i in 1..1_000_000_000 {
    arr[i-1] = rng.next_u64();

    let sample = rng.next_u32();
    let res = arr[sample as usize];
    if i % 10000000 == 0 {
        print!("After {} ", i);
        print!("Btw, arr[{}] == {} ", sample, res);
        print_allocated_memory();
    }
}

Even though I filled 1 billion entries with an actual u64 value and read out random values out of the array (mostly zeroes, I just tried to exclude compiler optimisation of the entire array here), my computer memory did not overflow.

Memory usage per jemalloc was this (please note that my computer only has 16 GB of RAM installed):

allocated 16777216.05 MiB resident 16777223.02 MiB

... whereas my OS reported a maximum of roughly 8000M (measured in htop) right at the end of the code.

Curiously, if I use any other default value than 0 (be it 1 or 100), the macro runs out of memory before finishing vector creation, so it surely has something to do with the init value being 0.

I wonder what the macro does to keep the resulting data structure so memory efficient? Are the elements in the array not really created? And if not, then how can it just work with me reading out random indices from the vector?

I have already checked the documentation, though it only says that it relies on the default element being of type Clone, which does not really mean anything for primitive types.

Stargateur
  • 24,473
  • 8
  • 65
  • 91
techfly
  • 1,826
  • 3
  • 25
  • 31
  • Which OS did you do this test on? – cdhowie Apr 20 '22 at 21:46
  • Some OS' have large chunks of copy-on-write memory which is pre-initialized to zero. Perhaps Rust is using those as an optimization. – Silvio Mayolo Apr 20 '22 at 21:47
  • I tested on Pop OS 21.10 (It's an Ubuntu derivate) 64-bit with 16 GB of RAM. – techfly Apr 20 '22 at 21:48
  • This is due to how paging works. If the OS gave you a new page of memory without clearing it first, you can perform an attack where you steal data from other users/applications that have recently exited. To mitigate this, nearly every OS these days will make sure the old memory is unreadable in some form or another. While it doesn't need to be zeroed, I imagine it was the easiest option to implement and is probably backed by some hardware optimizations. – Locke Apr 20 '22 at 21:51
  • Paging also enables it to not need to give you all of the memory right away. It can instead choose to only dedicate hardware resources once you write to a page. Maybe by the time you actually get to writing to that memory some other processes will have exited and there will be enough space available. – Locke Apr 20 '22 at 21:55
  • OS probably only give you virtual memory for your second test – Stargateur Apr 20 '22 at 22:23

1 Answers1

9

When allocating memory for a vector, there's a few allocation built-ins that can be used. When the type of the vector is numeric and the initial value given is zero, __rust_alloc_zeroed is used.

On Unix-compatible systems, the default implementation for this allocator function can use either calloc() or posix_memalign().

calloc() guarantees that the allocation is zeroed; posix_memalign() does not. If the latter is used, the Rust allocator will zero the memory itself.

Given the behavior you've observed, the only plausible explanation is that calloc() is used. Since the request cannot be satisfied by the library using previously-freed memory (the allocation is certainly too large) this request is passed on to the kernel, which creates entries in the process' page table for the requested allocation.

However, the OS doesn't have to actually assign a region of physical memory to each page in the allocation. It can defer this until later, a technique called overcommitment.

Reading from or writing to an address in the allocation region will trigger a page fault if it's not yet backed by physical memory. When this fault happens, the kernel resolves it by assigning a region of memory to the accessed page.

The end result of all of this is that if you create a vector of a numeric type where the initial value is zero, very little system memory is actually used by that allocation initially. Nearly all of the allocation is within pages that don't have backing system memory yet, analogous to a hole in a sparse file. As you write to the vector, the system will start assigning physical memory to the allocation and your used memory (and/or used swap) will start to increase.

cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • I think this is missing the part about the magic of the value zero: it's not just that `calloc` guarantees the memory is zeroed (which could be done by malloc-ing then zeroing) it's that `mmap` always hands out zeroed memory, which its callers (like malloc) know, and which the kernel can and will optimise for. Which is why it can hand out pages but not back them with any actual memory until they're dirtied (written to) and a "virtual" fully zeroed page is thus not sufficient. – Masklinn Apr 21 '22 at 06:30
  • @Masklinn I explained the magic of zero. This can't be fast with malloc because Rust guarantees zero-initialization, which malloc does not. Using malloc can return previously-allocated memory from the process' libc allocator, which may not be zeroed, so Rust would have to zero-init the whole allocation. calloc is the only way this _can_ be fast because of the zero-init guarantee. calloc's guarantee _and_ unbacked virtual memory pages are both required to make this work, and I explained both of those facets. – cdhowie Apr 21 '22 at 07:56
  • 1
    Addendum to this answer, which is entirely correct already: This demonstrates a quirk in Operating Systems which use overcommitment. The memory allocation itself may succeed even for completely ridiculous requests because the OS only actually backs the allocation page-by-page on first access. If it turns out that the OS runs out of free physical memory at some point, a program can actually segfault out of the blue. The upside is that huge allocations which are sparsely used can succeed without anyone noticing that the required physical memory was never there in the first place. – user2722968 Apr 21 '22 at 09:58
  • I see. Thanks for your explanation! – techfly Apr 21 '22 at 19:21