3

As the title says, I have an Iterator of Strings and would like to produce a Vec<char>, just by concatenating all of them (and while considering each String as a sequence of Unicode Scalar Values, as .chars() does). (I don't need to interleave a string between the Strings or anything like that, although I understand it wouldn't be much harder with, e.g. itertools::intersperse, as covered in other answers.)

pub fn flatten_strings(ss: impl Iterator<Item=String>) -> Vec<char> {
    // ???
}

(It would be nice, but not particularly necessary, if I could implement the slightly more general type signature:)

pub fn flatten_strings<S>(ss: impl Iterator<Item=String>) -> S where S: FromIterator<char> {
    // ???
}

Anyway, here's a simple attempt:

pub fn flatten_strings(ss: impl Iterator<Item=String>) -> Vec<char> {
    ss.flat_map(|s| s.chars()).collect()
}

Unfortunately, this doesn't work, because chars returns Chars, which contains a reference to the String it's based on; but the lambda took ownership of the String and will drop it when it returns.

error[E0515]: cannot return value referencing function parameter `s`
 --> src/main.rs:2:21
  |
2 |     ss.flat_map(|s| s.chars()).collect()
  |                     -^^^^^^^^
  |                     |
  |                     returns a value referencing data owned by the current function
  |                     `s` is borrowed here

I can fix this by just collecting all the Strings into an intermediate vector, so that I have somebody owning all the Strings, and am flatmapping an iterator of &Strings instead of Strings; but having to allocate that intermediate vector of Strings seems inefficient and somewhat defeat the point of using Rust's nice iterator abstractions:

pub fn flatten_strings(ss: impl Iterator<Item=String>) -> Vec<char> {
    ss.collect::<Vec<String>>().iter().flat_map(|s| s.chars()).collect()
}

Similarly, I could handroll a loop, but that also seems ugly. Can I implement this function efficiently without additional allocations, but also without leaving Rust's iterator abstractions?

betaveros
  • 1,360
  • 12
  • 23
  • 2
    Related: [How can I store a Chars iterator in the same struct as the String it is iterating on?](https://stackoverflow.com/questions/43952104/how-can-i-store-a-chars-iterator-in-the-same-struct-as-the-string-it-is-iteratin) – Sven Marnach Oct 06 '20 at 15:17
  • 1
    Another one: [Is there an owned version of String::chars?](https://stackoverflow.com/questions/47193584/is-there-an-owned-version-of-stringchars) – Sven Marnach Oct 06 '20 at 15:31
  • Can't you implement `flatten_strings(ss: impl Iterator) -> impl Iterator + '_` instead? – Sven Marnach Oct 06 '20 at 15:35
  • In my use cases, the `String`s in the input iterator are also intermediate values (for example, produced by `map`ping a different iterator with `format!`); I'm trying not to allocate space to hold all of them at once. – betaveros Oct 06 '20 at 16:40
  • Do the links above answer your question? – Sven Marnach Oct 06 '20 at 17:52
  • They do in the sense that I understand how any method of producing an owned version of String::chars would produce a corresponding solution to my problem (and one answer has been posted to that effect), but none of the solutions seem perfect to me (either additional allocations, or a LOT of additional code / an extra dependency). I was sort of hoping that the more specific goal of producing a self-owned Vec might enable some better workarounds for this ownership issue, I guess. – betaveros Oct 07 '20 at 12:32
  • Instead of creating a `Vec`, you could also simply concatenate all the strings into a single string. This will usually consume a lot less memory; an ASCII character consumes one byte in a UTF-8 string, but four bytes in a `Vec`, and English text consists mostly of ASCII characters. You can then use `concatenated_string.chars()` whenever you need it. – Sven Marnach Oct 07 '20 at 16:54
  • I know, and that's really easy (`String` implements `FromIterator`), but it happens that I want random access to the `char`s (sort of, it's a bit contrived but it's the reason I'm asking this question). – betaveros Oct 08 '20 at 02:25

2 Answers2

4

The main problem is that the s: String parameter in the closure is dropped before anyone can use the Chars object that depends on it.

Here are two ways to deal with that. The first is more verbose than your code, but uses the same type signature. The second relies on an iterator of &str that outlive the function and all its Chars objects.

pub fn flatten_strings(ss: impl Iterator<Item=String>) -> Vec<char> {
    let mut res = Vec::new();
    for s in ss {
        res.extend(s.chars());
    }
    res
}

pub fn flatten_strings2<'a>(ss: impl Iterator<Item=&'a str>) -> Vec<char> {
    ss.flat_map(|s| s.chars()).collect()
}
NovaDenizen
  • 5,089
  • 14
  • 28
  • Hm, now that you wrote out the hand-written loop, I realized that you can easily generalize it to return `S: Default + Extend`, which is implemented by many of the same things that implement `FromIterator`. – betaveros Oct 07 '20 at 13:05
  • That's quite reasonable. – NovaDenizen Oct 07 '20 at 15:10
2

If you can live with a temporary allocation, you can convert each string to a Vec<char> as you progress through the list and provide that to the flat_map. This will let the more general case work too. You can go slightly more general (as suggested by Sven Marnach) and just return an impl Iterator<Item = char> without the collect, and allow the caller collect if they want.

use std::collections::HashSet;
use core::iter::FromIterator;
pub fn flatten_strings(ss: impl Iterator<Item=String>) -> Vec<char> {
    ss.flat_map(|s| s.chars().collect::<Vec<_>>()).collect()
}

pub fn flatten_strings2<S>(ss: impl Iterator<Item=String>)  -> S where S: FromIterator<char>  {
    ss.flat_map(|s| s.chars().collect::<Vec<_>>()).collect()
}

pub fn flatten_strings3(ss: impl Iterator<Item=String>)  -> impl Iterator<Item = char>  {
    ss.flat_map(|s| s.chars().collect::<Vec<_>>())
}

fn main() {
    let v = vec!["A string ".to_string(), "another".to_string() ];
    println!("{:?}",flatten_strings(v.clone().into_iter()));
    let h: HashSet<char> = flatten_strings2(v.clone().into_iter());
    println!("{:?}",h);
    let h: HashSet<char> = flatten_strings3(v.clone().into_iter()).collect();
    println!("{:?}",h);
}
Michael Anderson
  • 70,661
  • 7
  • 134
  • 187
  • 1
    Instead of returning a type `S: FromIterator`, you could also simply return an `impl Iterator` and let the caller use it in whatever way they choose. – Sven Marnach Oct 07 '20 at 16:56