I am dealing with huge amounts of data in Rust and while evaluating the memory usage of data structures, I stumbled upon some surprising results.
First, I tried filling a vector manually by just pushing zeroes to it:
let mut arr = vec![];
for _ in 0..(4_294_967_294 as u32) {
arr.push(0);
}
After a while, quite expectedly I would say, my computer ran out of available memory and the process was killed by the OS.
However, if I initialise the vector using the macro initalisation, the behaviour changes:
let mut arr = vec![0; 2_147_483_647_000];
for i in 1..1_000_000_000 {
arr[i-1] = rng.next_u64();
let sample = rng.next_u32();
let res = arr[sample as usize];
if i % 10000000 == 0 {
print!("After {} ", i);
print!("Btw, arr[{}] == {} ", sample, res);
print_allocated_memory();
}
}
Even though I filled 1 billion entries with an actual u64 value and read out random values out of the array (mostly zeroes, I just tried to exclude compiler optimisation of the entire array here), my computer memory did not overflow.
Memory usage per jemalloc
was this (please note that my computer only has 16 GB of RAM installed):
allocated 16777216.05 MiB resident 16777223.02 MiB
... whereas my OS reported a maximum of roughly 8000M (measured in htop) right at the end of the code.
Curiously, if I use any other default value than 0 (be it 1 or 100), the macro runs out of memory before finishing vector creation, so it surely has something to do with the init value being 0.
I wonder what the macro does to keep the resulting data structure so memory efficient? Are the elements in the array not really created? And if not, then how can it just work with me reading out random indices from the vector?
I have already checked the documentation, though it only says that it relies on the default element being of type Clone
, which does not really mean anything for primitive types.