3

I thought the memory occupied by a rust HashMap was freed after the HashMap went out of scope, but in the following test it doesn't seem to happen.

use std::{collections::HashMap, thread, time::Duration};

fn func() {
    let mut map: HashMap<String, Vec<u8>> = HashMap::new();

    for i in 0..100_000 {
        map.insert(format!("{}", i), vec![0; 50_000]);
    }
}

fn main() {
    func();

    println!("LOOP");
    loop {
        thread::sleep(Duration::from_secs(5));
    }
}

Note: This program occupied about 5GB of RAM. When it arrives to the loop section the memory is still occupied.

enter image description here

Performing an analogue test with a vector:

use std::{thread, time::Duration};

fn func() {
    let mut x: Vec<Vec<u8>> = Vec::new();
    for _ in 0..100_000 {
        x.push(vec![0; 50_000])
    }
}
fn main() {
    func();
    println!("LOOP");

    loop {
        thread::sleep(Duration::from_secs(5));
    }
}

The result is very different, when it arrives to the loop the memory occupation drop to zero.

  • Rust version 1.60
  • Both the examples above are built on release mode

In this example (the index is an i32 instead a String) the memory is released to the os:

use std::{collections::HashMap, thread, time::Duration};

fn func() {
    let mut map: HashMap<i32, Vec<u8>> = HashMap::new();

    for i in 0..100_000 {
        map.insert(i, vec![0; 50_000]);
    }
}

fn main() {
    func();
    println!("LOOP");
    loop {
        thread::sleep(Duration::from_secs(5));
    }
}

UPDATE: Executing func in a thread seems to make the process release the memory.

willygroup
  • 143
  • 1
  • 10
  • 2
    My guess is that this is a memory allocator issue. For some reason the allocator seems to return memory to the OS with vectors, but it keeps it around with HashMap for reuse. This might be because HashMap has different memory patterns compared to vec. Note that it is a common tactic for an allocator to keep memory around for reuse. – freakish Apr 20 '22 at 12:34
  • 3
    free != release memory, it very very very rare once a program grap some memory than the global allocator give it back to the OS. I'm not fully aware of all this but unless the memory is allocate by something like manual mmap by yourself, it very unlikely that the memory will be release. your program will have this pool until its stop. – Stargateur Apr 20 '22 at 12:35
  • 2
    Maybe the optimizer is able to get completely rid of the vector code, leading to no allocation being performed in the first place, but not for the hash map, leading to memory being allocated then freed but not released to the OS… – Jmb Apr 20 '22 at 12:57
  • Nope: I just checked by dumping the contents of `/proc/self/statm` before allocating, before returning from `func` and after returning from `func`. The vector version does allocate almost as much memory as the hash map version, but the vector version releases the memory to the OS and the hash map version doesn't. Note however that calling `func` twice doesn't cause memory to increase, which proves that freed memory from the first call is reused in the second. – Jmb Apr 20 '22 at 14:01
  • 2
    Call `func` twice. If you have twice the usage, you have identified a real problem. If it doesn't, then it *may* be a problem in specific circumstances, but mostly it is not. – Kevin Anderson Apr 20 '22 at 14:13
  • Trace the memory usage during the appending, and perhaps the timing. It might be possible that the map allocates its entries incrementally in separated small objects, and the vector in one chunk with the need of reallocations. The allocator might decide that releasing a single big chunk is a good thing. – the busybee Apr 20 '22 at 14:33
  • @thebusybee I thought so too, but most of the allocated memory should be taken up by the `vec![0; 50_000]` which should be allocated the same in the hash map case as in the vector case. – Jmb Apr 20 '22 at 14:54
  • @Jmb Do you mean that the "outer" vector stores just 100000 references of the same "inner" vector of 50000 bytes? – the busybee Apr 20 '22 at 16:07
  • 1
    tested on windows vanilla, this deallocate properly for the hashmap version, thus as we said it up to the global allocator to decide this, it isn't a bug from Rust. – Stargateur Apr 20 '22 at 16:10
  • @thebusybee no there are 100_000 instances of `vec![0; 50_000]` in both cases. My point was that those 100_000 vectors make up most of the allocated memory and there is no reason for them to be allocated in a single chunk in either case even if the allocator uses different strategies for the main container. – Jmb Apr 21 '22 at 06:35
  • @Jmb Well, I thought that 100_000 instances of 50_000 bytes give the memory footprint of 5 GB the OP talks about. ;-) – the busybee Apr 21 '22 at 06:59
  • @thebusybee Yes they do. The question is why is that memory footprint released to the OS in the vector case and not in the hash map case _even though most of the allocated memory is in the 100_000 vectors that should be the same in both cases_? – Jmb Apr 21 '22 at 07:04
  • @Jmb There might be some magic threshold for block size between 50_000 bytes (100_000 of them, the inner vectors), some 10 bytes (assuming that the map is some kind of linked list: a single entry in the map, 100_000 of them), and x00_000 bytes (the outer vector, just one). Who knows? This leads to an other research idea... -- Anyway, the memory allocator is presumably not aware of internal structures of allocated blocks, but the sizes of allocated blocks. – the busybee Apr 21 '22 at 09:13

0 Answers0