2

How could I pack the following code into a single iterator?

use std::io::{BufRead, BufReader};
use std::fs::File;

let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));

for line in file.lines() {
   for ch in line.expect("Unable to read line").chars() {
      println!("Character: {}", ch);
   }
}

Naively, I’d like to have something like (I skipped unwraps)

let lines = file.lines().next();
Reader {
  line: lines,
  char: next().chars()
}

and iterate over Reader.char till hitting None, then refreshing Reader.line to a new line and Reader.char to the first character of the line. This doesn't seem to be possible though because Reader.char depends on the temporary variable.

Please notice that the question is about nested iterators, reading text files is used as an example.

Tim
  • 7,075
  • 6
  • 29
  • 58

2 Answers2

1

You can use the flat_map() iterator utility to create new iterator that can produce any number of items for each item in the iterator it's called on.

In this case, that's complicated by the fact that lines() returns an iterator of Results, so the Err case must be handled.

There's also the issue that .chars() references the original string to avoid an additional allocation, so you have to collect the characters into another iterable container.

Solving both issues results in this mess:

fn example() -> impl Iterator<Item=Result<char, std::io::Error>> {
    let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
    
    file.lines().flat_map(|line| match line {
        Err(e) => vec![Err(e)],
        Ok(line) => line.chars().map(Ok).collect(),
    })
}

If String gave us an into_chars() method we could avoid collect() here, but then we'd have differently-typed iterators and would need to use either Box<dyn Iterator> or something like either::Either.

Since you already use .expect() here, you can simplify a bit by using .expect() within the closure to avoid handling the Err case:

fn example() -> impl Iterator<Item=char> {
    let file = BufReader::new(File::open("sample.txt").expect("Unable to open file"));
    
    file.lines().flat_map(|line|
        line.expect("Unable to read line").chars().collect::<Vec<_>>()
    )
}

In the general case, flat_map() is usually quite easy. You just need to be mindful of whether you are iterating owned vs borrowed values; both cases have some sharp corners. In this case, iterating over owned String values makes using .chars() problematic. If we could iterate over borrowed str slices we wouldn't have to .collect().

cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • Thanks. Is there a lazy solution that avoids collecting? – Tim Jan 16 '23 at 20:13
  • 1
    @Tim I don't think there is one in the standard library. You would have to write your own "owning char iterator" based either on (1) functions in the standard library that are still experimental (`str::ceil_char_boundary()`) or (2) an "unsafe" self-owned reference, e.g. a struct holding the `String` as well as the `Chars` iterator, which is usually disallowed as self-owned references are not permitted as moving invalidates the references. Since the actual content of the `String` is on the heap, this turns out to be actually safe, but you have to convince Rust of that using `unsafe`. – cdhowie Jan 16 '23 at 21:13
  • I'll use something like this in my answer, but you helped me to get on track, so accepting it, thanks! – Tim Jan 17 '23 at 21:22
0

Drawing on the answer from @cdhowie and this answer that suggests using IntoIter to get an iterator of owned chars, I was able to come up with this solution that is the closest to what I expected:

use std::fs::File;
use std::io;
use std::io::{BufRead, BufReader, Lines};
use std::vec::IntoIter;

struct Reader {
    lines: Lines<BufReader<File>>,
    iter: IntoIter<char>,
}

impl Reader {
    fn new(filename: &str) -> Self {
        let file = BufReader::new(File::open(filename).expect("Unable to open file"));
        let mut lines = file.lines();
        let iter = Reader::char_iter(lines.next().expect("Unable to read file"));
        Reader { lines, iter }
    }

    fn char_iter(line: io::Result<String>) -> IntoIter<char> {
        line.unwrap().chars().collect::<Vec<_>>().into_iter()
    }
}

impl Iterator for Reader {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        match self.iter.next() {
            None => {
                self.iter = match self.lines.next() {
                    None => return None,
                    Some(line) => Reader::char_iter(line),
                };
                Some('\n')
            }
            Some(val) => Some(val),
        }
    }
}

it works as expected:

let reader = Reader::new("src/main.rs");
for ch in reader {
    print!("{}", ch);
}
Tim
  • 7,075
  • 6
  • 29
  • 58