2

I'd like to know if there's a way to cache an owned value between iterator adapters, so that adapters later in the chain can reference it. (Or if there's another way to allow later adapters to reference an owned value that lives inside the iterator chain.)

To illustrate what I mean, let's look at this (contrived) example:

I have a function that returns a String, which is called in an Iterator map() adapter, yielding an iterator over Strings. I'd like to get an iterator over the chars() in those Strings, but the chars() method requires a string slice, meaning a reference.

Is this possible to do, without first collecting the Strings?

Here's a minimal example that of course fails:

fn greet(c: &str) -> String {
    "Hello, ".to_owned() + c
}

fn main() {
    let names = ["Martin", "Helena", "Ingrid", "Joseph"];
    let iterator = names.into_iter().map(greet);
    let fails = iterator.flat_map(<str>::chars);
}

Playground

Using a closure instead of <str>::chars - |s| s.chars() - does of course not work either. It makes the types match, but breaks lifetimes.


Edit (2022-10-03): In response to the comments, here's some pseudocode of what I have in mind, but with incorrect lifetimes:

struct IteratorCache<'a, T, I>{
    item : Option<T>,
    inner : I,
    _p : core::marker::PhantomData<&'a T>
}

impl<'a, T, I> Iterator for IteratorCache<'a, T,I>
    where I: Iterator<Item=T>
{
    type Item=&'a T;
    fn next(&mut self) -> Option<&'a T> {
        self.item = self.inner.next();
        if let Some(x) = &self.item {
            Some(&x)
        } else {
            None
        }
    }
}

The idea would be that the reference could stay valid until the next call to next(). However I don't know if this can be expressed with the function signature of the Iterator trait. (Or if this can be expressed at all.)

soulsource
  • 197
  • 7
  • What do you want to do with the characters? – Chayim Friedman Oct 02 '22 at 13:20
  • Is it acceptable to just collect the chars of the *current* string? E.g. `iterator.flat_map(|s| s.chars().collect::>().into_iter())`: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bbaa3d84f928824d444c235072cdacfe – user4815162342 Oct 02 '22 at 13:23
  • @user4815162342 This will allocate more than collecting the strings. – Chayim Friedman Oct 02 '22 at 13:26
  • What's your expected outcome? The name `fails` of the output variable confuses me – Finomnis Oct 02 '22 at 13:33
  • @ChayimFriedman Will it really? It will convert each string to a `Vec`, but do so _on demand_ (as each string is created). E.g. if the iterator were to create a billion strings and used `count()` at the end, the largest allocation would be proportional to the char count of the largest string. OTOH if you were to "collect the strings" first, you'd have to allocate everything upfront. – user4815162342 Oct 02 '22 at 13:33
  • @user4815162342 Yes, but you allocate and deallocate billion times instead of much less (~30, and only once if the iterator size is known). Unless you are constrained by memory I wouldn't do that. – Chayim Friedman Oct 02 '22 at 13:34
  • @Finomnis I understood the name `fails` as meant to convey that this approach currently fails to compile. – user4815162342 Oct 02 '22 at 13:34
  • @ChayimFriedman I suppose one could use `SmallVec` to reduce the number of allocations if necessary. The point of this approach is to support iterators of arbitrary size, which allocating everything in advance certainly doesn't. The OP explicitly says "without collecting the strings", presumably for the same reason. – user4815162342 Oct 02 '22 at 13:37
  • @user4815162342 Better than using a `SmallVec` is re-using the same `Vec`. It will allocate only few times, but is boilerplate-y. – Chayim Friedman Oct 02 '22 at 13:38
  • 1
    If you were sure it was ascii encoded then you could get the wrapped `Vec` and `into_iter` it, but for utf8 it's an interesting problem. Not sure if an off-the-shelf solution already exists – Finomnis Oct 02 '22 at 13:38
  • 1
    @ChayimFriedman Sure, but that'd require a lending iterator. – user4815162342 Oct 02 '22 at 13:38
  • Strongly related: https://stackoverflow.com/questions/47193584/is-there-an-owned-version-of-stringchars – Finomnis Oct 02 '22 at 13:39
  • @user4815162342 It cannot be an iterator indeed, but if the OP actually needs the ability to keep the `Vec`s then your solution isn't optiomal either. – Chayim Friedman Oct 02 '22 at 13:41
  • @ChayimFriedman That's true - I was building on the OP's `flat_map(::chars)` example, where the inner Vec (String) is not needed except for iteration over chars. – user4815162342 Oct 02 '22 at 13:41
  • My idea would have been to iterate over references into the string. Like: Have an iterator that owns the string itself, but allows to iterate over subslices. The chars() example wasn't too great therefore. Such a thing should be possible, as long as later in the adapter chain ther's a copy somewhere. – soulsource Oct 02 '22 at 20:11
  • @user4815162342 "Lending Iterator" - that was the term I was looking for. Could you maybe make this an answer? (I haven't tried it yet, but https://docs.rs/lending-iterator/latest/lending_iterator/ looks very interesting) – soulsource Oct 02 '22 at 20:28
  • If the chars example isn't relevant, can you show a better one? I know it seems like it should be obvious what you're after, but there are many subtleties involved, and I'm no longer convinced I understand the exact use case. – user4815162342 Oct 02 '22 at 22:52
  • I'll update the question later (when I get home from work) with a pseudocode example of the tool I'm looking for. I was tired when I wrote the question and thought a good example would convey it better, but well, "tired" and "good example" don't match... – soulsource Oct 03 '22 at 06:29
  • Updated the question to include pseudocode of the "idea" I'm looking for. – soulsource Oct 03 '22 at 17:13

1 Answers1

0

I don't think something like this exists yet, and collecting into a Vec<char> creates some overhead, but you can write such an iterator yourself with a little bit of trickery:

struct OwnedCharsIter {
    s: String,
    index: usize,
}

impl OwnedCharsIter {
    pub fn new(s: String) -> Self {
        Self { s, index: 0 }
    }
}

impl Iterator for OwnedCharsIter {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        // Slice of leftover characters
        let slice = &self.s[self.index..];

        // Iterator over leftover characters
        let mut chars = slice.chars();

        // Query the next char
        let next_char = chars.next()?;

        // Compute the new index by looking at how many bytes are left
        // after querying the next char
        self.index = self.s.len() - chars.as_str().len();

        // Return next char
        Some(next_char)
    }
}

fn greet(c: &str) -> String {
    "Hello, ".to_owned() + c
}

fn main() {
    let names = ["Martin", "Helena", "Ingrid", "Joseph"];
    let iterator = names.into_iter().map(greet);
    let chars_iter = iterator.flat_map(OwnedCharsIter::new);

    println!("{:?}", chars_iter.collect::<String>())
}
"Hello, MartinHello, HelenaHello, IngridHello, Joseph"
Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • Thanks! This answer made me realize that I picked a bad example.... What I'd rather want to do is to have references into the string available in the iterator adapter chain. This should be possible, I think, as long as before the chain ends there's a copy. – soulsource Oct 02 '22 at 20:14
  • I just read the comments directly under my question. The answer I was looking for is "Lending Iterator". – soulsource Oct 02 '22 at 20:29
  • Most certainly is a bad example then ;) feel free to close/delete the question and retry. I think it's too late to go a completely new direction with this question. – Finomnis Oct 02 '22 at 22:24