0

Because I am developing an MLops framework that supports multiple types of database as backend and allows users to customize some fields. As the number of these fields is unknown at compiling time, I can only represent them using a vec. However, a trait method for one of the databases needs to be called frequently when inserting data, and its signature only supports using an array type. The solution provided for this question offers a good approach, but I believe it still involves memory allocation on the heap, which may not be fast enough for frequent data insertion in such a scenario and could slow down the users' entire application .

Suppose there's a vector of certain type with known size:

let vec = vec![1, 2, 3, 4];

How can we turn it into an array?

The method is expected to have minimal memory overhead, without any unnecessary clone operations or operations on the heap, etc.

I've tried this solution:

fn vec_to_array(vec: Vec<i32>) -> [i32; 4] {
    let boxed_slice = vec.into_boxed_slice();
    let ptr = Box::into_raw(boxed_slice) as *mut i32;
    unsafe { std::ptr::read(ptr); }
}

However, it appears to have poor performance since it requires memory allocation on the heap.

Yu Sun
  • 21
  • 4
  • 2
    Does this answer your question? [Is there a good way to convert a Vec to an array?](https://stackoverflow.com/a/29570662/20665825) – Jonas Fassbender Apr 27 '23 at 16:01
  • I'm not sure. But what about the memory overhead with `try_into()`? – Yu Sun Apr 27 '23 at 16:14
  • 2
    Do you have a measurable performance problem or are you just stressing about something that might not happen? Be wary of premature optimization! Normally Rust is pretty good about not using resources it doesn't need, but if you're converting large vectors into large arrays on an extremely frequent basis you may need to address this at the *algorithmic* level rather than through micro-optimizations. – tadman Apr 27 '23 at 16:27
  • `try_into()` does involve copying similar to your handwritten solution (see source code [here](https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#3229)). There's [this answer](https://stackoverflow.com/a/50080940/20665825) that shows how to convert `&[i32]` to `&[i32; 4]`, which does not involve copying. – Jonas Fassbender Apr 27 '23 at 16:32
  • @tadman You're right. But it could be an operation that occurs with (possibly) high frequency within our database application. Maybe I should just go ahead now. – Yu Sun Apr 27 '23 at 16:39
  • @JonasFassbender What about the method in `arrayvec` crate? I've found that this maybe work: `let mut array_vec = ArrayVec::<[i32; 4]>::new(); array_vec.extend_from_slice(&vec[..]); let array = array_vec.into_inner().unwrap();` – Yu Sun Apr 27 '23 at 16:41
  • 1
    `ArrayVec::into_inner` also uses `ptr::read` and therefore copies its contents to a new array (see [here](https://docs.rs/arrayvec/0.7.2/src/arrayvec/arrayvec.rs.html#664)). – Jonas Fassbender Apr 27 '23 at 17:00
  • 2
    If it's a frequent operation, then the real question is why bother with an array in the first place? You already have a `Vec`. Why not stick with it? Converting requires moving data, which is pointless if you can avoid it. – tadman Apr 27 '23 at 18:29
  • Does this answer your question? [Is there a good way to convert a Vec to an array?](https://stackoverflow.com/questions/29570607/is-there-a-good-way-to-convert-a-vect-to-an-array) – Zephyr Apr 27 '23 at 19:51
  • @tadman I'm working with a MLops framework supporting multiple database as backend, and one of which needs array here in its trait method. Since I wanna users can customize fields, the concrete number of these fiels is unknown as compile time. So it represents as Vec in my crate but needs to be converted to array when used with certain backend. – Yu Sun Apr 28 '23 at 02:51
  • @Zephyr Not yet. I've edited my question to explain the difference. – Yu Sun Apr 28 '23 at 03:04
  • 1
    Do you need a literal array, or just an array reference like `&[X]`? If it's just a reference, a `Vec` can provide that. – tadman Apr 28 '23 at 14:04

2 Answers2

1

Your solution is probably slow because it's leaking memory. The box you create is never freed. In order to write this correctly, you need to free the box without freeing the items in the slice (they have been copied naively), which you can do with ManuallyDrop.

let boxed_slice = vec.into_boxed_slice();
let ptr = Box::into_raw(boxed_slice);
let arr = unsafe { (ptr as *mut [T; 4]).read() };

unsafe { drop(Box::from_raw(ptr as *mut [ManuallyDrop<T>; 4])) };

arr

This doesn't require an allocation. It allocates when the capacity of the Vec is different from the length. Also, this function (just like yours) must be marked unsafe.

However, it's almost certainly equal to or faster than your method to just use try_into.

vec.try_into().unwrap()

This doesn't allocate.

There's also some other ways to create an array. If the Vec is exactly the right size, which usually happens if it was just created from a literal, then you can turn it into a boxed array with the same method, without allocating.

fn boxed_array<const LEN: usize>(vec: Vec<i32>) -> Box<[i32; LEN]> {
    vec.try_into().unwrap()
}

If you don't need ownership, then you can once again avoid allocating.

fn slice_to_array<const LEN: usize>(slice: &[i32]) -> &[i32; LEN] {
    slice.try_into().unwrap()
}

But keep in mind, the fastest thing you can do is make an array in the first place, and the second fastest is to leave it as a Vec.

drewtato
  • 6,783
  • 1
  • 12
  • 17
0
// we take mut of vec so you can reuse the vector, if you are consorned about performance
// this should be the default
fn vec_to_array<T, const SIZE: usize>(vec: &mut Vec<T>) -> Option<[T; SIZE]> {
    if vec.len() != SIZE {
        // goes without saying
        return None;
    }

    Some(unsafe { vec_to_array_unchecked(vec) })
}

// # Safety
// Caller must ensure vec.len() == SIZE
unsafe fn vec_to_array_unchecked<T, const SIZE: usize>(vec: &mut Vec<T>) -> [T; SIZE] {
    debug_assert_eq!(vec.len(), SIZE); // debug assert for benefit of future self

    // this just allocates stack or even uninitialized register for small arrays
    // you may want to make this a mut parameter for grater control over stack
    let mut arr: MaybeUninit<[T; SIZE]> = MaybeUninit::uninit();

    let (ptr, len) = (vec.as_ptr(), vec.len());

    // we transfered ownership to array
    vec.set_len(0);

    // move the memory
    std::ptr::copy_nonoverlapping(ptr, arr.as_mut_ptr() as *mut T, len);

    // array is now initialized
    arr.assume_init()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_vec_to_array() {
        let mut vec = vec![1, 2, 3, 4, 5]; // <- only allocation

        let fail = vec_to_array::<i32, 6>(&mut vec);
        assert!(fail.is_none());
        assert_eq!(vec.len(), 5);

        let arr = vec_to_array::<i32, 5>(&mut vec).unwrap();
        assert_eq!(arr, [1, 2, 3, 4, 5]);
        assert_eq!(vec.len(), 0);
    }
}
Jakub Dóka
  • 2,477
  • 1
  • 7
  • 12