67

My Rust program is intented to read a very large (up to several GB), simple text file line by line. The problem is, that this file is too large to be read at once, or to transfer all lines into a Vec<String>.

What would be an idiomatic way to handle this in Rust?

Boiethios
  • 38,438
  • 19
  • 134
  • 183
Piwo
  • 1,347
  • 1
  • 12
  • 19
  • 19
    @Kroltan - Not so much; this page is now the top ranking entry according to a Google search on "efficiently read lines from file in rust". (-: – Huw Walters Jan 07 '19 at 18:47
  • @HuwWalters It was not at the time of posting! Current readers will be directed to any of those duplicates. And I'd dare say that SO posts still qualify as "examples" so the comment is still up-to-date (-: – This company is turning evil. Jan 07 '19 at 23:52
  • That was my point; it's a useful page! (irony may have fallen a bit flat though) – Huw Walters Jan 09 '19 at 04:44
  • 19
    I don't agree that this question has already been answered on SO and certainly not by the links present. OP specifically requested a method for *very large* files where the file is too large to be read at once or read line-by-line, a more efficient method must exist, which makes this a valid question – Awalias Aug 20 '19 at 17:56
  • I searched for "rust read 1 line from a file" and arrived here. I must say, I've very disappointed on how difficult rust make it to read a line from a file. It's only a few lines of code in C# – AriesConnolly Jul 18 '23 at 09:57

1 Answers1

144

You want to use the buffered reader, BufRead, and specifically the function BufReader.lines():

use std::fs::File;
use std::io::{self, prelude::*, BufReader};

fn main() -> io::Result<()> {
    let file = File::open("foo.txt")?;
    let reader = BufReader::new(file);

    for line in reader.lines() {
        println!("{}", line?);
    }

    Ok(())
}

Note that you are not returned the linefeed, as said in the documentation.


If you do not want to allocate a string for each line, here is an example to reuse the same buffer:

fn main() -> std::io::Result<()> {
    let mut reader = my_reader::BufReader::open("Cargo.toml")?;
    let mut buffer = String::new();

    while let Some(line) = reader.read_line(&mut buffer) {
        println!("{}", line?.trim());
    }

    Ok(())
}

mod my_reader {
    use std::{
        fs::File,
        io::{self, prelude::*},
    };

    pub struct BufReader {
        reader: io::BufReader<File>,
    }

    impl BufReader {
        pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
            let file = File::open(path)?;
            let reader = io::BufReader::new(file);

            Ok(Self { reader })
        }

        pub fn read_line<'buf>(
            &mut self,
            buffer: &'buf mut String,
        ) -> Option<io::Result<&'buf mut String>> {
            buffer.clear();

            self.reader
                .read_line(buffer)
                .map(|u| if u == 0 { None } else { Some(buffer) })
                .transpose()
        }
    }
}

Playground

Or if you prefer a standard iterator, you can use this Rc trick I shamelessly took from Reddit:

fn main() -> std::io::Result<()> {
    for line in my_reader::BufReader::open("Cargo.toml")? {
        println!("{}", line?.trim());
    }

    Ok(())
}

mod my_reader {
    use std::{
        fs::File,
        io::{self, prelude::*},
        rc::Rc,
    };

    pub struct BufReader {
        reader: io::BufReader<File>,
        buf: Rc<String>,
    }
    
    fn new_buf() -> Rc<String> {
        Rc::new(String::with_capacity(1024)) // Tweakable capacity
    }

    impl BufReader {
        pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
            let file = File::open(path)?;
            let reader = io::BufReader::new(file);
            let buf = new_buf();

            Ok(Self { reader, buf })
        }
    }

    impl Iterator for BufReader {
        type Item = io::Result<Rc<String>>;

        fn next(&mut self) -> Option<Self::Item> {
            let buf = match Rc::get_mut(&mut self.buf) {
                Some(buf) => {
                    buf.clear();
                    buf
                }
                None => {
                    self.buf = new_buf();
                    Rc::make_mut(&mut self.buf)
                }
            };

            self.reader
                .read_line(buf)
                .map(|u| if u == 0 { None } else { Some(Rc::clone(&self.buf)) })
                .transpose()
        }
    }
}

Playground

Boiethios
  • 38,438
  • 19
  • 134
  • 183
  • 3
    Note that a new-line is considered to be a LF or a CR followed by a LF. – Noel Widmer Aug 25 '17 at 13:29
  • this isn't necessarily the most efficient way if the file size is greater than the available memory – Awalias Aug 20 '19 at 17:53
  • 3
    @Awalias I updated my answer to be clearer. If you don't understand something specific, please say me what. – Boiethios Aug 22 '19 at 09:21
  • 7
    starting to learn Rust and this was super helpful and informative – King Friday Dec 20 '19 at 18:15
  • 1
    Theoretically, could the line `Rc::make_mut(&mut self.buf)` be replaced with `Rc::get_mut(&mut self.buf).unwrap()` since we know it's available because we just made a new buffer with `self.buf = new_buf();` I know it doesn't matter for this case but I'm implementing a buffer that stores the line into a struct from a csv. (And don't want to derive Clone for the struct). – financial_physician Dec 11 '22 at 01:00
  • What type does read_line return? It's not a string as I would have hoped for. – AriesConnolly Jul 18 '23 at 09:53
  • @AriesConnolly It's not a plain string, because sometimes the read can fail (`Some(Err(…))`), and after reading all the lines, it returns `None`. – Boiethios Jul 25 '23 at 13:38