So, I am trying to perform a sort of line-oriented operation on a gz compressed file bigger than available RAM, so reading it first into a string is excluded. The question is, how to do it in rust ( short of gunzip file.gz|./my-rust-program
)?
My current solution is based on flate2
and a bunch of buffered readers:
use std::path::Path;
use std::io::prelude::*;
use std::io::BufReader;
use std::fs::File;
use flate2::bufread::GzDecoder as BufGzDecoder;
fn main() {
let mut fname = "path_to_a_big_file.gz";
let f = File::open(fname).expect("Ooops.");
let bf = BufReader::new(f); // Here's the first reader so I can plug data into BufGzDecoder.
let br = BufGzDecoder::new(bf); // Yep, here. But, oops, BufGzDecoder has not lines method,
// so try to stick it into a std BufReader.
let bf2 = BufReader::new(br); // What!? This works!? Yes it does.
// After a long time ...
eprintln!("count: {}",bf2.lines().count());
// ... the line count is here.
}
To put the above into words, I noticed I cannot plug a file straight into the flate2::bufread::GzDecoder
, so I first created the std::io::BufReader
instance which is compatible with the constructor method of the former. But, I did not see any useful iterator associated with flate2::bufread::GzDecoder
, so I built another std::io::BufReader
on top of it. Surprisingly, that worked, I got my Lines
iterator and it read the whole file in just over a minute on my machine, but feels like it's overly verbose and a inelegant as well as possibly inefficient (more worried about this part).