My program reads CSV files using the csv
crate into Vec<Vec<String>>
, where the outer vector represents rows, and the inner separates rows into columns.
use std::{time, thread::{sleep, park}};
use csv;
fn main() {
different_scope();
println!("Parked");
park();
}
fn different_scope() {
println!("Reading csv");
let _data = read_csv("data.csv");
println!("Sleeping");
sleep(time::Duration::from_secs(4));
println!("Going out of scope");
}
fn read_csv(path: &str) -> Vec<Vec<String>> {
let mut rdr = csv::Reader::from_path(path).unwrap();
return rdr
.records()
.map(|row| {
row
.unwrap()
.iter()
.map(|column| column.to_string())
.collect()
})
.collect();
}
I'm looking at RAM usage with htop
and this uses 2.5GB of memory to read a 250MB CSV file.
Here's the contents of cat /proc/<my pid>/status
Name: (name)
Umask: 0002
State: S (sleeping)
Tgid: 18349
Ngid: 0
Pid: 18349
PPid: 18311
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 4 24 27 30 46 118 128 133 1000
NStgid: 18349
NSpid: 18349
NSpgid: 18349
NSsid: 18311
VmPeak: 2748152 kB
VmSize: 2354932 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 2580156 kB
VmRSS: 2345944 kB
RssAnon: 2343900 kB
RssFile: 2044 kB
RssShmem: 0 kB
VmData: 2343884 kB
VmStk: 136 kB
VmExe: 304 kB
VmLib: 2332 kB
VmPTE: 4648 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 0/127783
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000180000440
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ffffffff
Cpus_allowed_list: 0-31
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 9
nonvoluntary_ctxt_switches: 293
When I drop the variable, it frees the correct amount (approx. 250MB), but there's still 2.2GB left. I'm unable to read more than 2-3GB before all my memory is used and the process is killed (cargo
prints "Killed").
How do I free the excess memory while the CSV is being read?
I need to process every line, but in this case I don't need to hold all this data at once, but what if I did?
I asked a related question and I was pointed to What is Rust strategy to uncommit and return memory to the operating system? which was helpful in understanding the problem, but I don't know how to solve it.
My understanding is I should switch my crate to a different memory allocator, but brute forcing through all the allocators I can find feels like an ignorant approach.