2

Basically I have a function to filter all string with the longest length. In order to avoid error expected struct 'String', found '&String', I've tried both of the following solution. Both of them gave the correct result.

With into_iter

fn allLongestStrings(inputArray: Vec<String>) -> Vec<String> {
    let max_len = inputArray.iter().map(|string| string.len()).max().unwrap();
    inputArray.into_iter().filter(|string| string.len().eq(&max_len)).collect()
}

With iter

fn allLongestStrings(inputArray: Vec<String>) -> Vec<String> {
    let max_len = inputArray.iter().map(|string| string.len()).max().unwrap();
    inputArray.iter().filter(|string| string.len().eq(&max_len)).map(|s| s.to_string()).collect()
}

The into_iter looks cleaner, but in this answer, it said that the result of into_iter is sometimes surprising.

  1. So, which approach above should I choose?
  2. And, is there any speed performance difference?
Herohtar
  • 5,347
  • 4
  • 31
  • 41
R.yan
  • 2,214
  • 1
  • 16
  • 33
  • 5
    The version with into_iter is better, because you do not allocate unnecessary copies of the strings – Svetlin Zarev Sep 23 '21 at 16:03
  • A good crate for timing code is [timeit](https://docs.rs/timeit/0.1.2/timeit/), I've found doing my own benchmarks to be the best way to get answers on the performance of different approaches. – Todd Sep 23 '21 at 18:35
  • 2
    The caveat with `.into_iter()` is, when applied specifically to an array (and not a `Vec`), it will yield references to items. This is how it works with the stable build now; but that will change in the future. Right now, the nightly Rust build has this change and `.into_iter()` will produce non-reference items. Again, this applies only to arrays and not `Vec`'s. `.into_iter()` applied to a `Vec`' works as you'd expect. – Todd Sep 23 '21 at 19:55

1 Answers1

2

Given that both methods take the vector by value, then the first version (with into_iter()) is better because it does not make copies of the strings.

When a string matches the condition, it is moved into the destination vector. While with the second version (with iter()), when a string matches the predicate, it is being copied and the copy is moved in the destination vector.This results in more allocations and more deallocations.

Here is simple criterion benchmark to demonstrate that:

use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion}; //0.3.5

criterion_group!(benches, benchmark);
criterion_main!(benches);

pub fn benchmark(c: &mut Criterion) {
    let input = vec![
        "some string".to_owned(),
        "some string".to_owned(),
        "some string".to_owned(),
        "some string".to_owned(),
        "some string".to_owned(),
        "some string".to_owned(),
        "some string".to_owned(),
    ];

    c.bench_with_input(BenchmarkId::new("with_iter", ""), &input, |b, i| {
        b.iter(|| {
            let result = with_iter(i.clone());
            black_box(result);
        });
    });

    c.bench_with_input(BenchmarkId::new("with_into_iter", ""), &input, |b, i| {
        b.iter(|| {
            let result = with_into_iter(i.clone());
            black_box(result);
        });
    });
}

fn with_into_iter(input_array: Vec<String>) -> Vec<String> {
    let max_len = input_array.iter().map(|string| string.len()).max().unwrap();
    input_array
        .into_iter()
        .filter(|string| string.len() == max_len)
        .collect()
}

fn with_iter(input_array: Vec<String>) -> Vec<String> {
    let max_len = input_array.iter().map(|string| string.len()).max().unwrap();
    input_array
        .iter()
        .filter(|string| string.len() == max_len)
        .map(|s| s.to_string())
        .collect()
}

This is obviously the extreme case when all the values will be copied. Depending on your input if you have very few short strings to copy, the difference might be negligible.

with_iter/              time:   [392.30 ns 394.80 ns 397.93 ns]                       
with_into_iter/         time:   [153.07 ns 153.41 ns 153.79 ns]                            

Obviously the timings depend on the number of strings to copy and their length, so the actual times depend on the function's input.

PS: And if you want to be as fast as possible, you can use retain():

fn with_retain(mut input_array: Vec<String>) -> Vec<String> {
    let max_len = input_array.iter().map(|string| string.len()).max().unwrap();
    input_array.retain(|string| string.len() == max_len);
    input_array
}
with_retain/      time:   [143.98 ns 144.22 ns 144.50 ns]                               

It is faster because:

  • it does not allocate a second "result" vector
  • if there are a lot of matches, then the result vector would not need to be resized (because the output length will always be <= the input length), thus avoiding additional allocations/deallocations & copying

The downside is that if there are very few matches on long input arrays, then the output vector will be too big, thus wasting a bit of memory.

Svetlin Zarev
  • 14,713
  • 4
  • 53
  • 82
  • If you don't want to waste space with the `retain` version, you can use [`shrink_to_fit`](https://doc.rust-lang.org/1.54.0/std/vec/struct.Vec.html#method.shrink_to_fit) to free up the wasted space. – Jmb Sep 24 '21 at 07:23