How do I repeat some elements in a vector based on a condition?

Question

I encountered this problem during a kata. My more readable implementation was the following:

use std::vec::Vec;

fn repeat_even(v: Vec<i32>) -> Vec<i32> {
    v.into_iter().flat_map(|x| match x % 2 { 0 => vec![x, x], _ => vec![x] }).collect()
}

fn main() {
    let v = vec![1, 2, 3, 4, 6];
    assert_eq!(repeat_even(v), vec![1, 2, 2, 3, 4, 4, 6, 6]);
}

I have two questions about it:

Is it necessary to create another Vec? Is is possible to use the same Vec, i.e. modify it while iterating?
My solution is, I imagine, inefficient: I allocate a lot of vectors, and I have no guarantee that this will be optimized. Is there a better solution: readable and with less allocation?

ljedrz · Answer 1 · 2017-05-20T07:18:20.383

5

You could do it within the same vector, but it would require moving the rest of the vector (after the doubled number) every time you encounter an even number, which is inefficient. It would be better to do it using a new vector and a simple loop:

fn main() {
    let v = vec![1, 2, 3, 4, 6];

    let mut v2 = Vec::with_capacity(v.len() + v.iter().filter(|&n| n % 2 == 0).count());

    for n in v {
        v2.push(n);
        if n % 2 == 0 { v2.push(n) }
    }

    assert_eq!(v2, vec![1, 2, 2, 3, 4, 4, 6, 6]);
}

This solution allocates memory only once with the exact space required to hold all the numbers, including doubled evens.

edited May 20 '17 at 07:18

answered May 20 '17 at 07:10

ljedrz

20,316
4
69
97

This is the straight-forward solution, but we lose the functional smell. I love the possibility to create a collection with one line, like with haskell. – Boiethios May 20 '17 at 07:23
@Boiethios: it is tempting, true, but note that efficient (especially when it comes to vectors) Haskell rarely looks that way :). – ljedrz May 20 '17 at 08:00

score 4 · Accepted Answer · edited May 23 '17 at 12:34

flat_map expects iterators, so you can return an iterator of the values:

use std::iter;

fn double_even(v: &[i32]) -> Vec<i32> {
    v.iter().flat_map(|&x| {
        let count = if x % 2 == 0 { 2 } else { 1 };
        iter::repeat(x).take(count)
    }).collect()
}

fn main() {
    let v = vec![1, 2, 3, 4, 6];
    assert_eq!(double_even(&v), vec![1, 2, 2, 3, 4, 4, 6, 6]);
}

Things to note:

There's no reason to have use std::vec::Vec. It's already imported via the prelude.
You aren't making use of the memory allocation of the passed-in vector, so there's no reason to take it. See also Why is it discouraged to accept a reference to a String (&String) or Vec (&Vec) as a function argument?
Don't use vec!(); use vec![] instead. It doesn't matter to the compiler, but it matters to humans.

If you were really set on attempting to reuse the memory, I'd walk backwards along the iterator to help avoid index invalidation:

fn double_even(mut v: Vec<i32>) -> Vec<i32> {
    for i in (0..v.len()).rev() {
        let val = v[i]; 
        if val % 2 == 0 {
            v.insert(i, val);
        }
    }
    v
}

This is probably algorithmically worse; each insert moves all the data after it. I believe the worst-case would be O(n^2) when every element were even.

I also wouldn't normally take by-value here. I'd instead take a mutable reference. You could always wrap it back in a value if you really needed it:

fn double_even_ref(v: &mut Vec<i32>) {
    for i in (0..v.len()).rev() {
        let val = v[i];
        if val % 2 == 0 {
            v.insert(i, val);
        }
    }
}

fn double_even(mut v: Vec<i32>) -> Vec<i32> {
    double_even_ref(&mut v);
    v
}

As usually, I think your answer is better and more detailed about the different solutions. In a real case, I would try each solution and do some benchmark, but it was only some exercise, so... — Boiethios, Jun 09 '17 at 12:18

score 3 · Answer 3 · edited May 20 '17 at 13:18

Is it necessary to create another Vec? Is is possible to use the same Vec, i.e. modify it while iterating?

It's possible but not efficient. Vec allocates a block of memory on the heap, where each element is adjacent to the next one. If you just wanted to do some numeric operation on each element then yes, you could do that operation in place. But you need to insert new elements in between others, which would mean moving all the following elements one place to the right and (possibly) allocating more memory.

The Haskell code you are thinking about is probably using a Haskell Data.List which is a linked list not a vector. If you used a more memory-efficient structure like Data.Vector.Unboxed or repa then you would also not be able to insert elements while iterating.

My solution is, as I imagine, inefficient: I allocate a lot of vectors, and I have no guarantee that this will be optimized. Is it a better solution: readable and with less allocation?

Something like this might work. It has a functional feel still, but works by allocating one Vec and then mutating it:

fn double_even(v: Vec<i32>) -> Vec<i32> {
    // allocate for the worst case (i.e. all elements of v are even)
    let result = Vec::with_capacity(v.len() * 2);
    v.into_iter().fold(result, |mut acc, n| {
        acc.push(n);
        if n % 2 == 0 {
            acc.push(n);
        }
        acc
    })
}

You can also shrink_to_fit() at the end, but it would look a bit uglier, as you couldn't return the solution as an expression.

score 2 · Answer 4 · edited Jun 20 '20 at 09:12

Is it necessary to create another Vec? Is is possible to use the same Vec, i.e. modify it while iterating?

My solution is, I imagine, inefficient: I allocate a lot of vectors, and I have no guarantee that this will be optimized. Is there a better solution: readable and with less allocation?

One thing you can do which is quite idiomatic is to implement your function as an "iterator adapter" - that is, rather than dealing with Vec in particular, look at Iterators of i32 elements instead. Then everything will be a variable on the stack, and no allocations will be made at all. It could look something like this:

struct DoubleEven<I> {
    iter: I,
    next: Option<i32>,
}

impl<I> Iterator for DoubleEven<I>
    where I: Iterator<Item=i32>
{
    type Item = i32;
    fn next(&mut self) -> Option<Self::Item> {
        self.next.take().or_else(||
            self.iter.next().map(|value| {
                if value % 2 == 0 { self.next = Some(value) }
                value
            })
        )
    }
}

Then you can write

fn main() {
    let vec = vec![1, 2, 3, 4, 5, 6];
    let double_even = DoubleEven {
        iter: vec.into_iter(),
        next: None,
    };
    for x in double_even {
        print!("{}, ", x)  // prints 1, 2, 2, 3, 4, 4, 5, 6, 6, 
    }
}

Even better, you can add a function double_even to anything that can be turned into an iterator of i32, allowing you to write the following:

trait DoubleEvenExt : IntoIterator + Sized {
    fn double_even(self) -> DoubleEven<Self::IntoIter> {
        DoubleEven {
            iter: self.into_iter(),
            next: None,
        }
    }
}

impl<I> DoubleEvenExt for I where I: IntoIterator<Item=i32> {}

fn main() {
    let vec = vec![1, 2, 3, 4, 5, 6];
    for x in vec.double_even() {
        print!("{}, ", x)  // prints 1, 2, 2, 3, 4, 4, 5, 6, 6, 
    }
}

Now I will admit that in this case the boilerplate is adding up, but you can see that at the callsite the code is really very terse. For more complex adapters this pattern can be very useful. In addition, beyond the initial Vec allocation, there is no memory allocation going on whatsoever! Just stack-allocated variables, allowing for highly-efficient code in a release build.

How do I repeat some elements in a vector based on a condition?

4 Answers4