0

I have a string iterator lines that I get from stdin with

use std::io::{self, BufRead};

let mut stdin = io::stdin();
let lines = stdin.lock().lines().map(|l| l.unwrap());

The lines iterator yields values of type String, not &str. I want to create an iterator that iterates over the input words instead of lines. It seems like this should be doable but my naive attempt does not work:

let words = lines.flat_map(|l| l.split_whitespace());

The compiler tells me that l is being dropped while still borrowed, which makes sense:

error[E0597]: `l` does not live long enough
 --> src/lib.rs:6:36
  |
6 |     let words = lines.flat_map(|l| l.split_whitespace());
  |                                    ^                  - `l` dropped here while still borrowed
  |                                    |
  |                                    borrowed value does not live long enough
7 | }
  | - borrowed value needs to live until here

Is there some other clean way that accomplishes this?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • I believe your question is answered by the answers of [How can I create an efficient iterator of chars from stdin with Rust?](https://stackoverflow.com/q/50394209/155423) and/or [Is there an owned version of String::chars?](https://stackoverflow.com/q/47193584/155423) / [How can I store a Chars iterator in the same struct as the String it is iterating on?](https://stackoverflow.com/q/43952104/155423), just replace `chars` with appropriate logic. – Shepmaster Dec 03 '18 at 19:36

1 Answers1

1

In your example code, lines is an iterator over the lines read in from the reader you have obtained from stdin. As you say, it returns String instances, but you are not storing them anywhere.

std::string::String::split_whitespace is defined like this:

pub fn split_whitespace(&self) -> SplitWhitespace

So, it takes a reference to a string - it does not consume the string. It returns an iterator that yields string slices &str - which reference portions of the string, but don't own it.

In fact as soon as the closure you have passed to flat_map is done with it, no-one owns it, so it is dropped. That would leave the &str yielded by words dangling, thus the error.

One solution is to collect the lines into a vector, like this:

let lines: Vec<String> = stdin.lock().lines().map(|l| l.unwrap()).collect();

let words = lines.iter().flat_map(|l| l.split_whitespace());

The String instances are kept in the Vec<String>, which can live on so that the &str yielded by words have something to refer to.

If there were a lot of lines, and you did not want to keep them all in memory, you might prefer to do it a line at a time:

let lines = stdin.lock().lines().map(|l| l.unwrap());

let words = lines.flat_map(|l| {
    l.split_whitespace()
        .map(|s| s.to_owned())
        .collect::<Vec<String>>()
        .into_iter()
});

Here the words of each line are collected into a Vec, a line at a time. The trade-off is less overall memory consumption, against the overhead of constructing a Vec<String> for each line, and copy each word into it.

You might have been hoping for a zero-copy implementation, which consumed the Strings that lines produces. I think that would be possible to create, by creating a split_whitespace() function that takes ownership of the String and returns an iterator that owns the string.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
harmic
  • 28,606
  • 5
  • 67
  • 91
  • `std::string::String::split_whitespace` is a bit misleading, since `split_whitespace()` is in fact an inherent method on the primitive `str` type. The `&String` is converted to `&str` by a deref coercion. – Sven Marnach Dec 04 '18 at 09:14