How to remove duplicates from a vector of structures?

Question

I have a struct and a vector of them:

enum MySize {
    Big,
    Small,
    Medium,
    Huge,
}

struct MyStruct {
    my_size: MySize,
    field1: String,
    field2: u64,
    field3: f64,
}

let mut my_structs: Vec<MyStruct> = get_data_with_duplicates();
//how to remove duplicates from 'my_structs'?

I'm aware of sort_by and dedup_by, but I only know how to use them with the primitive types. In my case these methods can't be applied as are, right?

How to remove duplicates then?

Your struct has to implement the `PartialEq` Trait, and it will work — Unlikus, Sep 11 '19 at 11:48
[The duplicates applied to your question](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=21d28580f231a710ed185d6e526bebd9) — Shepmaster, Sep 11 '19 at 14:02
Since you have a floating point number, you might be interested in [Using max_by_key on a vector of floats](https://stackoverflow.com/q/37127209/155423); [How to do a binary search on a Vec of floats?](https://stackoverflow.com/q/28247990/155423); [How can I use a HashMap with f64 as key in Rust?](https://stackoverflow.com/q/39638363/155423). — Shepmaster, Sep 11 '19 at 14:21
@Shepmaster `^^^^^^^^^^ no implementation for `MyStruct == MyStruct``. That is, `PartialOrd` isn't implemented — tajara, Sep 11 '19 at 15:28
@tajara that doesn't explain anything, unfortunately. The [code I provided you](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=21d28580f231a710ed185d6e526bebd9) calls `dedup` on a vector of your structs, as you've asked. It compiles successfully. `==` uses [`PartialEq`](https://doc.rust-lang.org/std/cmp/trait.PartialEq.html), not `PartialOrd`, so I don't follow your point. Perhaps you can use [the Rust Playground](https://play.rust-lang.org/) to **show me** how you have modified the working code to make it non-working? — Shepmaster, Sep 11 '19 at 15:36
@Shepmaster from my project: https://i.postimg.cc/3NL1drsq/Captura-de-pantalla-2019-09-11-a-la-s-19-46-43.png — tajara, Sep 11 '19 at 15:47
@Shepmaster https://i.postimg.cc/DwbVkRsh/Captura-de-pantalla-2019-09-11-a-la-s-19-51-41.png — tajara, Sep 11 '19 at 15:52
that is, why do you call "dedup" without calling "sort" first? — tajara, Sep 11 '19 at 16:11

score 2 · Answer 1 · edited Oct 20 '21 at 08:39

the dedup family of functions will not help you without a ton of extra work, as they only clear neighboring duplicates. the sort family of functions requires Ord to function, which requires Eq as well. As such, you have three options open to you:

You can implement Eq (it can be used in #[derive]) and Hash (which can also be #[derive]d if all fields implement Hash) on your struct and then convert your Vec into a HashSet. This allows you to get a very performant set primitive out of the box

You can implement a naive filter (playground link) with a memory and processing complexity of O(N), which keeps an internal state of all the visited nodes and simply filters based on it. The only requirement for this method is PartialEq, which is easy to #[derive]. This can be optimized further. The current code is below, implemented as an extension trait to allow you to reuse it wherever you like:

  trait Dedup<T: PartialEq + Clone> {
      fn clear_duplicates(&mut self);
  }

  impl<T: PartialEq + Clone> Dedup<T> for Vec<T> {
      fn clear_duplicates(&mut self) {
          let mut already_seen = Vec::new();
          self.retain(|item| match already_seen.contains(item) {
              true => false,
              _ => {
                  already_seen.push(item.clone());
                  true
              }
          })
      }
  }

There are other implementations/ways of doing it, but those are some of the quickest.

Nice `clear_duplicates` function! As an optimization, the `already_seen` vector could be constructed with `Vec::with_capacity(self.len())` to avoid resizing the vector while while processing. For vectors with a lot of duplicates this would increase memory overhead, but if there are few duplicates it should decrease CPU instructions from vector resizing. — Danilo Bargen, Jan 28 '21 at 23:58

How to remove duplicates from a vector of structures?

1 Answers1