2

I have generated an array of numbers. I would like to remove the duplicates. In javascript I can just use [...new Set(arr)] and get the job done.

In Rust, I havn't found a simple way to achieve that so far.

I've written:

use rand::{thread_rng, Rng};
use itertools::Itertools;

fn main() {
    let mut arr:Vec<u8> = Vec::new();
    for _ in 0..10 {
        arr.push(thread_rng().gen_range(0..10))
    }
    println!("random {:?}", arr);
    arr.iter().unique();
    println!("unique {:?}", arr);
}

The output are:

random [7, 0, 3, 6, 7, 7, 1, 1, 8, 6]
unique [7, 0, 3, 6, 7, 7, 1, 1, 8, 6]

So I've tried to get the "no duplicate" result in another variable:

let res = &arr.iter().unique();

The result was:

Unique { iter: UniqueBy { iter: Iter([1, 2, 0, 0, 7, 0, 2, 2, 1, 6]), used: {} } }

Also, it seems I can't sort the array before performing the removal of duplicate. This code returns an error: no method named 'iter' found for unit type '()' in the current scope method not found in '()'.

arr.sort().iter().unique();

Also, maybe there is a way to achieve the sort+unique value output without external crates?

jthulhu
  • 7,223
  • 2
  • 16
  • 33
DoneDeal0
  • 5,273
  • 13
  • 55
  • 114
  • Note that the "unit type" exists in other languages as `void` but other languages don't allow you to create a value of type `void`. This means when you declare a function in Rust with no return type, it implicitly returns `()`. The declarations `fn foo() { }` and `fn foo() -> () { }` are equivalent. So if you try to chain method calls and you get this error, one of those method calls is returning `()` which means you're probably mistaken about how the method works. (In this case, `.sort()` sorts the array _in-place_ and doesn't return anything.) – cdhowie Jul 09 '22 at 14:02

4 Answers4

7

Using the standard library

Usually, sorting an array is an ok way of deduplicating it, but, except if you are using a radix sort (which is not the sorting method Rust uses), it's asymptotically better to do what you would do in JS. Here is the Rust equivalent:

let a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
let uniqued_vector = a_vector
    .into_iter()
    .collect::<HashSet<_>>()
    .into_iter()
    .collect::<Vec<_>>();

This will turn your array into an iterator, then that iterator into a HashSet (which will deduplicate it), then back again into an iterator form, then finally into an array.

See it on the playground.


If you wonder why we have to have to go back and forth between these iterator representations, it's because they are the "interface" Rust uses to transform any datatype into any other datatype, very efficiently and while allowing you to perform some operations along the way easily. Here we don't actually need to do anything more than the conversion so that's why it may seem a little bit verbose.

Using the itertools crate

The itertools crate provides utilities to work on iterators (the same that we use as an interface to convert between datatypes). However, a peculiarity of iterators is that they are lazy, in a way, in the sense that they, in themselves, are not a datatype used to store information. They only represent operations performed, through the iterable interface, on a collection. For this reason, you actually need to transform an iterator back into a usable collection (or consume it in any way), otherwise it will do nothing (literally).

So the correct version of your code would probably be

let a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
let uniqued_vector = a_vector
    .into_iter()
    .unique()
    .collect::<Vec<_>>();

You don't need to sort anything because, internally, .unique() works pretty much like the first implementation.

Sorting the array

As said earlier, sorting the array is fine, so you might still want to do that. However, unlike previous solutions, this won't involve only iterators because you can't sort an iterator (there is no such method provided by the Iterator trait, nor by the actual type produced by a_vector.into_iter())! However, once you have sorted the array, you may want to deduplicate it, that is, remove consecutive repetitions, which is also not provided by the Iterator trait. However, both of these are actually simply provided by Vec, so the solution is simply:

let mut a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
a_vector.sort_unstable();
a_vector.dedup();

And then a_vector contains unique elements.

Note that this is only true if you use only the standard library. Itertools provides both a sorted method and a dedup one, so with itertools you could do:

let a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
let uniqued_vector = a_vector
    .into_iter()
    .sorted_unstable()
    .dedup()
    .collect::<Vec<_>>();

But at this point you'd be better off using .unique().


If you wonder what the difference between .iter() and .into_iter(), see this question.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
jthulhu
  • 7,223
  • 2
  • 16
  • 33
  • Thank you. So it works, but I can't add a sort() after the second into_iter() nor collect(). How so? Also, why doesn't `arr.iter().unique()` doesn't work? It doesn't make sense. – DoneDeal0 Jul 09 '22 at 16:44
  • 1
    @DoneDeal0 I edited my answer, maybe it will help you understand. – jthulhu Jul 09 '22 at 17:12
  • Amazing answer, I'm sorry I can't upvote you twice. Thank you so much for your clear explanations and your time. Indeed, I had a misconception about the Iterator trait. Coming from JS, Rust is much more demanding haha. – DoneDeal0 Jul 09 '22 at 17:53
1

Inspired from the example of retain(), we can rely on an intermediate sequence of booleans (see solution_1() below).

I don't like very much the idea of an intermediate storage, but I guess (I should have benchmarked, but I did not) that this simple vector of booleans is less expensive than creating a HashSet or sorting.

For an in-place solution, I'm afraid we have to write the algorithm by ourselves. The trick relies on split_at_mut() in order to avoid a multiple-borrows issue (see solution_2() below).

use rand::{thread_rng, Rng};

fn solution_1(mut arr: Vec<u8>) -> Vec<u8> {
    println!("random {:?}", arr);
    let keep: Vec<_> = arr
        .iter()
        .enumerate()
        .map(|(i, x)| !arr[0..i].contains(x))
        .collect();
    let mut keep_iter = keep.iter();
    arr.retain(|_| *keep_iter.next().unwrap());
    println!("unique {:?}", arr);
    arr
}

fn solution_2(mut arr: Vec<u8>) -> Vec<u8> {
    println!("random {:?}", arr);
    let mut kept = 0;
    for i in 0..arr.len() {
        let (head, tail) = arr.split_at_mut(i);
        let x = tail.first_mut().unwrap();
        if !head[0..kept].contains(x) {
            if kept != i {
                std::mem::swap(&mut head[kept], x);
            }
            kept += 1;
        }
    }
    arr.resize(kept, Default::default());
    println!("unique {:?}", arr);
    arr
}

fn main() {
    let mut arr: Vec<u8> = Vec::new();
    for _ in 0..20 {
        arr.push(thread_rng().gen_range(0..10))
    }
    assert_eq!(solution_1(arr.clone()), solution_2(arr));
}
/*
random [2, 3, 6, 8, 4, 1, 1, 9, 1, 9, 1, 6, 2, 5, 5, 4, 0, 0, 5, 4]
unique [2, 3, 6, 8, 4, 1, 9, 5, 0]
random [2, 3, 6, 8, 4, 1, 1, 9, 1, 9, 1, 6, 2, 5, 5, 4, 0, 0, 5, 4]
unique [2, 3, 6, 8, 4, 1, 9, 5, 0]
*/
quornian
  • 9,495
  • 6
  • 30
  • 27
prog-fh
  • 13,492
  • 1
  • 15
  • 30
0

Several things:

  1. .unique() returns an iterator. You have to turn it back into a vector with .collect(). This section of the Rust book contains the basic rules around using iterators

  2. .sort() modifies the vector in place. Unit type means an expression has no type, functions can sometimes not have any output and just have side effects (like modyfing something in place). Try:

    arr.sort();

    let uniques = arr.iter().unique().collect()

0

If you don't need the original insertion order you can get the array sorted w/o duplicates:

let unique = std::collections::BTreeSet::from([7, 0, 3, 6, 7, 7, 1, 1, 8, 6]);
let arr = unique.into_iter().collect::<Vec<_>>();
println!("{arr:?}");

Playground

Kaplan
  • 2,572
  • 13
  • 14