-1

I have a struct and a vector of them:

enum MySize {
    Big,
    Small,
    Medium,
    Huge,
}

struct MyStruct {
    my_size: MySize,
    field1: String,
    field2: u64,
    field3: f64,
}

let mut my_structs: Vec<MyStruct> = get_data_with_duplicates();
//how to remove duplicates from 'my_structs'?

I'm aware of sort_by and dedup_by, but I only know how to use them with the primitive types. In my case these methods can't be applied as are, right?

How to remove duplicates then?

Boiethios
  • 38,438
  • 19
  • 134
  • 183
tajara
  • 63
  • 1
  • 5
  • 1
    Your struct has to implement the `PartialEq` Trait, and it will work – Unlikus Sep 11 '19 at 11:48
  • Why cannot you use those functions? – Boiethios Sep 11 '19 at 13:53
  • 1
    [The duplicates applied to your question](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=21d28580f231a710ed185d6e526bebd9) – Shepmaster Sep 11 '19 at 14:02
  • Since you have a floating point number, you might be interested in [Using max_by_key on a vector of floats](https://stackoverflow.com/q/37127209/155423); [How to do a binary search on a Vec of floats?](https://stackoverflow.com/q/28247990/155423); [How can I use a HashMap with f64 as key in Rust?](https://stackoverflow.com/q/39638363/155423). – Shepmaster Sep 11 '19 at 14:21
  • @Shepmaster that doesn't work – tajara Sep 11 '19 at 14:28
  • 2
    @tajara show me how it doesn't work – Shepmaster Sep 11 '19 at 14:29
  • @Shepmaster `^^^^^^^^^^ no implementation for `MyStruct == MyStruct``. That is, `PartialOrd` isn't implemented – tajara Sep 11 '19 at 15:28
  • @tajara that doesn't explain anything, unfortunately. The [code I provided you](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=21d28580f231a710ed185d6e526bebd9) calls `dedup` on a vector of your structs, as you've asked. It compiles successfully. `==` uses [`PartialEq`](https://doc.rust-lang.org/std/cmp/trait.PartialEq.html), not `PartialOrd`, so I don't follow your point. Perhaps you can use [the Rust Playground](https://play.rust-lang.org/) to **show me** how you have modified the working code to make it non-working? – Shepmaster Sep 11 '19 at 15:36
  • @Shepmaster from my project: https://i.postimg.cc/3NL1drsq/Captura-de-pantalla-2019-09-11-a-la-s-19-46-43.png – tajara Sep 11 '19 at 15:47
  • @Shepmaster https://i.postimg.cc/DwbVkRsh/Captura-de-pantalla-2019-09-11-a-la-s-19-51-41.png – tajara Sep 11 '19 at 15:52
  • that is, why do you call "dedup" without calling "sort" first? – tajara Sep 11 '19 at 16:11
  • @Shepmaster your cooooode won't wooooork – tajara Sep 12 '19 at 05:07

1 Answers1

2

the dedup family of functions will not help you without a ton of extra work, as they only clear neighboring duplicates. the sort family of functions requires Ord to function, which requires Eq as well. As such, you have three options open to you:

  • You can implement Eq (it can be used in #[derive]) and Hash (which can also be #[derive]d if all fields implement Hash) on your struct and then convert your Vec into a HashSet. This allows you to get a very performant set primitive out of the box

  • You can implement a naive filter (playground link) with a memory and processing complexity of O(N), which keeps an internal state of all the visited nodes and simply filters based on it. The only requirement for this method is PartialEq, which is easy to #[derive]. This can be optimized further. The current code is below, implemented as an extension trait to allow you to reuse it wherever you like:

      trait Dedup<T: PartialEq + Clone> {
          fn clear_duplicates(&mut self);
      }
    
      impl<T: PartialEq + Clone> Dedup<T> for Vec<T> {
          fn clear_duplicates(&mut self) {
              let mut already_seen = Vec::new();
              self.retain(|item| match already_seen.contains(item) {
                  true => false,
                  _ => {
                      already_seen.push(item.clone());
                      true
                  }
              })
          }
      }
    

There are other implementations/ways of doing it, but those are some of the quickest.

sno2
  • 3,274
  • 11
  • 37
Sébastien Renauld
  • 19,203
  • 2
  • 46
  • 66
  • Nice `clear_duplicates` function! As an optimization, the `already_seen` vector could be constructed with `Vec::with_capacity(self.len())` to avoid resizing the vector while while processing. For vectors with a lot of duplicates this would increase memory overhead, but if there are few duplicates it should decrease CPU instructions from vector resizing. – Danilo Bargen Jan 28 '21 at 23:58