1

I have a function that takes a collection of &T (represented by IntoIterator) with the requirement that every element is unique.

fn foo<'a, 'b, T: std::fmt::Debug, I>(elements: &'b I)
where
    &'b I: IntoIterator<Item = &'a T>,
    T: 'a,
    'b: 'a,

I would like to also write a wrapper function which can work even if the elements are not unique, by using a HashSet to remove the duplicate elements first.

I tried the following implementation:

use std::collections::HashSet;

fn wrap<'a, 'b, T: std::fmt::Debug + Eq + std::hash::Hash, J>(elements: &'b J)
where
    &'b J: IntoIterator<Item = &'a T>,
    T: 'a,
    'b: 'a,
{
    let hashset: HashSet<&T> = elements.into_iter().into_iter().collect();
    foo(&hashset);
}

playground

However, the compiler doesn't seem happy with my assumption that HashSet<&T> implements IntoIterator<Item = &'a T>:

error[E0308]: mismatched types
  --> src/lib.rs:10:9
   |
10 |     foo(&hashset);
   |         ^^^^^^^^ expected type parameter, found struct `std::collections::HashSet`
   |
   = note: expected type `&J`
              found type `&std::collections::HashSet<&T>`
   = help: type parameters must be constrained to match other types
   = note: for more information, visit https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters

I know I could use a HashSet<T> by cloning all the input elements, but I want to avoid unnecessary copying and memory use.

Jeremy Salwen
  • 8,061
  • 5
  • 50
  • 73
  • 2
    *Why* are you doing all that convoluted lifetime fuss? It certainly seems like you've dug a hole and just kept going. From the code provided, there's no reason to have any of the lifetimes; remember that a generic type `X` can hold **within it** a reference. [example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d6f80f8e83ffcccd5f2339c1730a7af8). You may also be interested in [How do I write the lifetimes for references in a type constraint when one of them is a local reference?](https://stackoverflow.com/q/44343166/155423) – Shepmaster Dec 23 '19 at 20:34
  • 1
    A [more direct example that works with your provided `main`](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e176c4e00582b0b2fddb653d2b0ffece). – Shepmaster Dec 23 '19 at 20:48
  • I started adding lifetimes because the compiler told me to add lifetimes if I didn't have them: [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0bf13ef0bfee15715d7a75a811a32e8d). If I try to use the fact that a generic type can be a reference, then it wants to perform a move rather than a borrow: [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=898983e831d7da218958464d9c39b176) – Jeremy Salwen Dec 23 '19 at 21:14
  • 1
    In both of those examples, you have `&` still in the function signature / bounds. The [example I provided](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e176c4e00582b0b2fddb653d2b0ffece) does not use them at all; why is it not sufficient? – Shepmaster Dec 23 '19 at 21:16
  • Good to know about rustfmt in the playground, I rely on it heavily normally. I think I realize what my confusion was. I thought the compiler was complaining that I needed to have `I::Item : Copy` (which I really don't want), but actually it was just complaining I needed `I: Copy`. Your original example doesn't have the `Copy` bound and thus would hit that same error if you try to iterate twice. Your updated example does include the `Copy` bound and thus solves the problem. – Jeremy Salwen Dec 23 '19 at 21:23
  • Also, my original case (before I reduced it to this example) these functions were inside a trait implementation. I [can't do it exactly the same way](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e6e0b8bbe531a742039365daecd706fc) because T is now a type parameter instead of a trait. I [tried using Deref](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2283963e6ccd23cbfb4f3cb07c66bfab) to work around this, but the compiler still seems unhappy because `&Deref` doesn't implement `Deref`? – Jeremy Salwen Dec 23 '19 at 22:40
  • 1
    this code doesn’t make sense. You provide a reference to the vector, which yields references to the items, but expect to get owned values back, without ever copying or cloning the values. You also cannot use `PhantomData` like that – Shepmaster Dec 23 '19 at 23:47
  • Sure, you can understand the code from the perspective of the compiler, and that's why it doesn't make sense to you. But I don't know how the compiler will parse it, and that's why I'm asking for help. In this case, what do you mean "expect to get owned values back". The only thing the functions return is Collection, which only contains data derived from T, not an actual owned T value. Why can't I use `PhantomData` "like that"? What is "like that"? – Jeremy Salwen Dec 24 '19 at 06:12
  • Playing around in the sandbox, I came to the conclusion that the root of my problem is still basically the same as the title of this question: How do I use a `&HashSet<&T>` as a `IntoIterator`? Regardless of how much I clean up other issues with my code, that is the issue which remains. I was able to solve this problem by creating a helper class: [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=861a6bfa3dbd998bf3fd0164ef3187f7) – Jeremy Salwen Dec 24 '19 at 09:20
  • *The only thing the functions return is `Collection`, which only contains data derived from `T`, not an actual owned `T` value*. — That's not correct. The first sentence of the [`PhantomData`](https://doc.rust-lang.org/std/marker/struct.PhantomData.html) documentation addresses that (emphasis mine): *Zero-sized type used to mark things that **"act like" they own a `T`**.* – Shepmaster Dec 24 '19 at 13:57

2 Answers2

2

If you have a &HashSet<&T> and need an iterator of &T (not &&T) that you can process multiple times, then you can use Iterator::copied to convert the iterator's &&T to a &T:

use std::{collections::HashSet, fmt::Debug, hash::Hash, marker::PhantomData};

struct Collection<T> {
    item: PhantomData<T>,
}

impl<T> Collection<T>
where
    T: Debug,
{
    fn foo<'a, I>(elements: I) -> Self
    where
        I: IntoIterator<Item = &'a T> + Clone,
        T: 'a,
    {
        for element in elements.clone() {
            println!("{:?}", element);
        }
        for element in elements {
            println!("{:?}", element);
        }
        Self { item: PhantomData }
    }
}

impl<T> Collection<T>
where
    T: Debug + Eq + Hash,
{
    fn wrap<'a, I>(elements: I) -> Self
    where
        I: IntoIterator<Item = &'a T>,
        T: 'a,
    {
        let set: HashSet<_> = elements.into_iter().collect();
        Self::foo(set.iter().copied())
    }
}

#[derive(Debug, Hash, PartialEq, Eq)]
struct Foo(i32);

fn main() {
    let v = vec![Foo(1), Foo(2), Foo(4)];
    Collection::<Foo>::wrap(&v);
}

See also:


Note that the rest of this answer made the assumption that a struct named Collection<T> was a collection of values of type T. OP has clarified that this is not true.

That's not your problem, as shown by your later examples. That can be boiled down to this:

struct Collection<T>(T);

impl<T> Collection<T> {
    fn new(value: &T) -> Self {
        Collection(value)
    }
}

You are taking a reference to a type (&T) and trying to store it where a T is required; these are different types and will generate an error. You are using PhantomData for some reason and accepting references via the iterator, but the problem is the same.

In fact, PhantomData makes the problem harder to see as you can just make up values that don't work. For example, we never have any kind of string here but we "successfully" created the struct:

use std::marker::PhantomData;

struct Collection<T>(PhantomData<T>);

impl Collection<String> {
    fn new<T>(value: &T) -> Self {
        Collection(PhantomData)
    }
}

Ultimately, your wrap function doesn't make sense, either:

impl<T: Eq + Hash> Collection<T> {
    fn wrap<I>(elements: I) -> Self
    where
        I: IntoIterator<Item = T>,

This is equivalent to

impl<T: Eq + Hash> Collection<T> {
    fn wrap<I>(elements: I) -> Collection<T>
    where
        I: IntoIterator<Item = T>,

Which says that, given an iterator of elements T, you will return a collection of those elements. However, you put them in a HashMap and iterate on a reference to it, which yields &T. Thus this function signature cannot be right.

It seems most likely that you want to accept an iterator of owned values instead:

use std::{collections::HashSet, fmt::Debug, hash::Hash};

struct Collection<T> {
    item: T,
}

impl<T> Collection<T> {
    fn foo<I>(elements: I) -> Self
    where
        I: IntoIterator<Item = T>,
        for<'a> &'a I: IntoIterator<Item = &'a T>,
        T: Debug,
    {
        for element in &elements {
            println!("{:?}", element);
        }
        for element in &elements {
            println!("{:?}", element);
        }

        Self {
            item: elements.into_iter().next().unwrap(),
        }
    }
}

impl<T> Collection<T>
where
    T: Eq + Hash,
{
    fn wrap<I>(elements: I) -> Self
    where
        I: IntoIterator<Item = T>,
        T: Debug,
    {
        let s: HashSet<_> = elements.into_iter().collect();
        Self::foo(s)
    }
}

#[derive(Debug, Hash, PartialEq, Eq)]
struct Foo(i32);

fn main() {
    let v = vec![Foo(1), Foo(2), Foo(4)];
    let c = Collection::wrap(v);
    println!("{:?}", c.item)
}

Here we place a trait bound on the generic iterator type directly and a second higher-ranked trait bound on a reference to the iterator. This allows us to use a reference to the iterator as an iterator itself.

See also:

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • Your assumption that my collection of T must actually hold an owned value of type T is wrong. The data structure is a hash bashed probabilistic data structure, so it does not actually store an owned value of type T, but yet it is important for type safety for it to keep track of the type T. I just checked how the bloomfilter library implements it, and [they use exactly the same pattern](https://docs.rs/bloomfilter/1.0.1/src/bloomfilter/lib.rs.html#29-36). How else are you supposed to have type safety for this type of data structure? – Jeremy Salwen Dec 24 '19 at 18:33
  • Your example also subtly changes the question from using a `&HashSet<&T>` as an `IntoIterator` into the other, much easier problems of using a `&HashSet<&T>` as `IntoIterator` or using a `&HashSet` as `IntoIterator`. Because the functions are independently generic over T, the fact that the `T` in `foo` is different from the `T` In `wrap` in your code is hidden. In reality, your example uses a `&HashSet<&T>` as `IntoIterator`, which as you pointed out, is already implemented by the standard library. – Jeremy Salwen Dec 24 '19 at 18:48
  • I really do appreciate your feedback, and it would have been **completely impossible** for me to solve this problem without your assistance, but as I have learned more, I still think that the wrapper class is necessary in my use case (I would be happy to be proven wrong). Rather than dumping my whole codebase here, I tried to post a small self contained reproduction, but I didn't know exactly what was relevant and what was superfluous. That is why my original question didn't make all the requirements clear (I don't know they were relevant!). I appreciate your patience and help. – Jeremy Salwen Dec 24 '19 at 18:58
  • 1
    @JeremySalwen updated. You are correct that I mismatched the `&T` and `&&T`, but luckily that's easily fixable. You are also correct that I made assumptions about your original code based on the names provided (I would argue that a bloomfilter, etc. are **not** *collection*s). – Shepmaster Dec 24 '19 at 19:52
  • Thanks, your latest answer solves the core of my issue: .copied() converts the `&&T` to `&T`, and I can pass the iterator directly to implement`IntoIterator`. There is only one nit that I have which is that the implementation will accept any type `I` that implements `Clone`. So for example if someone passes in `Vec`, it will happily clone the `Vec` instead of complaining that the user should use `&Vec` instead. I think I can work around this by using the `Iterator` trait everywhere instead of `IntoIterator`. – Jeremy Salwen Dec 24 '19 at 22:39
0

There were a number of orthogonal issues with my code that Shepmaster pointed out, but to solve the issue of using a HashSet<&T> as an IntoIterator<Item=&T>, I found that one way to solve it is with a wrapper struct:

struct Helper<T, D: Deref<Target = T>>(HashSet<D>);

struct HelperIter<'a, T, D: Deref<Target = T>>(std::collections::hash_set::Iter<'a, D>);

impl<'a, T, D: Deref<Target = T>> Iterator for HelperIter<'a, T, D>
where
    T: 'a,
{
    type Item = &'a T;
    fn next(&mut self) -> Option<Self::Item> {
        self.0.next().map(|x| x.deref())
    }
}

impl<'a, T, D: Deref<Target = T>> IntoIterator for &'a Helper<T, D> {
    type Item = &'a T;
    type IntoIter = HelperIter<'a, T, D>;
    fn into_iter(self) -> Self::IntoIter {
        HelperIter((&self.0).into_iter())
    }
}

Which is used as follows:

struct Collection<T> {
    item: PhantomData<T>,
}

impl<T: Debug> Collection<T> {
    fn foo<I>(elements: I) -> Self
    where
        I: IntoIterator + Copy,
        I::Item: Deref<Target = T>,
    {
        for element in elements {
            println!("{:?}", *element);
        }
        for element in elements {
            println!("{:?}", *element);
        }
        return Self { item: PhantomData };
    }
}

impl<T: Debug + Eq + Hash> Collection<T> {
    fn wrap<I>(elements: I) -> Self
    where
        I: IntoIterator + Copy,
        I::Item: Deref<Target = T> + Eq + Hash,
    {
        let helper = Helper(elements.into_iter().collect());
        Self::foo(&helper);
        return Self { item: PhantomData };
    }
}

fn main() {
    let v = vec![Foo(1), Foo(2), Foo(4)];
    Collection::<Foo>::wrap(&v);
}

I'm guessing that some of this may be more complicated than it needs to be, but I'm not sure how.

full playground

Jeremy Salwen
  • 8,061
  • 5
  • 50
  • 73