How can I load all entries of a Vec of arbitrary length onto the stack?

Question

I am currently working with vectors and trying to ensure I have what is essentially an array of my vector on the stack. I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec. Is this at all possible?

Having read the Rustonomicon on how to implement Vec, it seems to stride over pointers on the heap, dereferencing at each entry. I want to chunk in Vec entries from the heap into the stack for fast access.

You probably misunderstood some part of the Rustonomicon, as `Vec`s are continuous allocations of memory. Of course, the elements themselves might somehow be boxes or pointers. Either way, it should be a good idea to link what you read on Rustonomicon, and show what kind of `T`s you expect, so maybe an answer can lift your confusion. — Caesar, Feb 02 '21 at 01:59
https://doc.rust-lang.org/nomicon/vec.html https://doc.rust-lang.org/nomicon/vec-layout.html from https://doc.rust-lang.org/std/vec/struct.Vec.html "..Vec will never perform a "small optimization" where elements are actually stored on the stack for two reasons:.." due to this I am lead to believe all values are stored on the heap. given the iterative implementation in the 'nomicon I believe that it must dereference at each pointer increment which is aforementioned on the heap. I am expecting generic T's to be specific — Bots Fab, Feb 02 '21 at 02:01
Ha no. Because the stack can be popped off and should have better cache locality. I have only ever read the Heap implemented as Trees and HashMaps which are spread throughout memory and in so that the lookup time was worse since every dereference has to search these data structures (which can be fragmented) instead of the ordered address of the stack data structure. — Bots Fab, Feb 02 '21 at 02:36
A [`Vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html)'s data is always contiguous ("A contiguous growable array type"), which is as good as you can get for locality. You seem to be confused between the data types "stack" and "heap" and the memory regions "stack" and "heap". I encourage you to research the difference; I assume that will clear things up a deal. — Shepmaster, Feb 02 '21 at 02:57
Shepmaster is, as usual, correct. The heap-like nature of memory allocation only comes into play when you're allocating or freeing memory -- you don't have to traverse *a heap* (data structure) every time you use something that's stored in *the heap* (generic term for dynamically allocated memory, which may or may not be structured as an actual heap). Once you have a pointer to the data (which a `Vec` is), it doesn't matter whether it's on the stack or the heap. — trent, Feb 02 '21 at 03:28
Stack based addressing is done with integer arithmetic calculated from the relative position of the program counter. Heap memory is addressed by searching a tree (or other sparsely backed data structure) this is due to how after allocation the memory is fragmented and the addressing is not relative to each other. Please correct me if I am still wrong. https://stackoverflow.com/questions/24057331/is-accessing-data-in-the-heap-faster-than-from-the-stack https://stackoverflow.com/questions/51928246/what-is-the-difference-between-stack-pointer-and-program-counter — Bots Fab, Feb 02 '21 at 07:07
Stack based addressing is done with integer arithmetic calculated from the _stack pointer_ (not the program counter). Heap memory is addressed with integer arithmetic calculated from the base pointer of the object being addressed. Potato/Potahto. Heap traversal only occurs when memory is being allocated or freed, never for regular addressing (assuming the allocator uses a heap structure, which it may not). — Jmb, Feb 02 '21 at 07:35
Thanks for clarifying the SP. How can you resolve a set of arbitrary addresses at the same speed as a set of relative addresses? Can you address the point made under Access in the first response: https://stackoverflow.com/questions/24057331/is-accessing-data-in-the-heap-faster-than-from-the-stack#answer-24057744 — Bots Fab, Feb 02 '21 at 07:54
There is no "set of arbitrary addresses". `Vec` contains a simple pointer to the data, that's it. No resolving, just a single dereference, exactly like with (arbitrarily-sized) data on the stack. — user4815162342, Feb 02 '21 at 09:10
The entire point of a stack is that it's contiguous. Why have two data structures than at all? why not just allocate on the stack? The operating systems memory virtualization and management together with the scheduler are solving a fitting problem of computer resources by using two very distinct methodologies in data structures. Unless something changed since I last studied this we aren't referring to Vec but rather what "global allocator" is doing behind the scenes which is a call to the Operating System for non contiguous, dynamically sized memory. — Bots Fab, Feb 02 '21 at 16:17
Look, it's like this. Imagine you work at a library. Occasionally people will come in and ask you for a book. Most of the time they want to read one of the new releases, like the latest Maggie Holt book or whatever. Right? So you have a shelf full of new releases, maybe ordered by date. This is the stack. But it would be impractical and useless to sort the whole library by date, so you have the other books organized by topic according to LCSH or whatever. This is the heap. The analogy is imperfect but bear with me. Now, having a pointer to something is like knowing what shelf it's on -- (1/) — trent, Feb 03 '21 at 01:58
if you *already know* where the book is, it will take about as long to find it regardless of whether it's in the new releases section or not. Moreover, the *book itself* is a single contiguous chunk of pages, regardless of which shelf it comes from, so once you've found it it takes precisely the same amount of time to read regardless of which shelf you found it on, because *all the pages are next to each other in order*. Similarly, a `Vec` is *a whole book*, a chunk of pages that are contiguous in memory, not a bunch of pages scattered across the whole library. Moving a book from the (2/) — trent, Feb 03 '21 at 02:03
"heap" to the "stack" won't make you read it faster because *once you have the book* it no longer matters how you got there. To make the analogy closer to the actual stack and heap, imagine that you instead run a "book storage" business so instead of sorting by topic and date you instead sort all books by size and weight, and instead of looking them up when a customer requests a particular title, you only look up a book when that particular volume's owner comes in and wants to remove it. (3/3) — trent, Feb 03 '21 at 02:15
https://stackoverflow.com/questions/45753923/can-a-vector-be-moved-and-modified-without-an-extra-allocation This helped me a lot. The memory is contiguous, you cant beat a vec unless you want to remove extra capacity with a shrink. thanks everyone. — Bots Fab, Feb 06 '21 at 06:05

Shepmaster · Accepted Answer · 2021-02-02T02:09:47.387

5

You can use the unsized_locals feature in nightly Rust:

#![feature(unsized_locals)]

fn example<T>(v: Vec<T>) {
    let s: [T] = *v.into_boxed_slice();
    dbg!(std::mem::size_of_val(&s));
}

fn main() {
    let x = vec![42; 100];
    example(x); // Prints 400
}

See also:

I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec

Sure you can.

Vec [...] seems to stride over pointers on the heap, dereferencing at each entry

Accessing each member in a Vec requires a memory dereference. Accessing each member in an array requires a memory dereference. There's no material difference in speed here.

for fast access

I doubt this will be any faster than directly accessing the data in the Vec. In fact, I wouldn't be surprised if it were slower, since you are copying it.

edited Feb 02 '21 at 02:09

answered Feb 02 '21 at 02:02

Shepmaster

388,571
95
1,107
1,366

I was under the impression dereferencing from the stack would be faster than the heap's data structure. I am implementing an algorithm lazily that has time to chunk in memory asynchronously while processing so thought this may save some time. Thank you I could not get the compiler to allow me to .into_boxed_slice because [Cell] did not implement the Sized trait. I assumed that this was due to not having [Cell; n] explicitly declared since T was sized. I will try again and read up on nightly. – Bots Fab Feb 02 '21 at 02:14
1

@BotsFab the initial deref will be slower because the stack is in-cache while the heap location is not, but loading the vec will cache it either way. Except by loading the cache on the stack you're bloating the cache, and possibly the code itself (not sure about unsized locals but I know that in C VLAs lead to absolute garbage codegen). – Masklinn Feb 02 '21 at 06:52
@Masklinn the would be VLAs are relatively small and fit well on the cache. It's a tree structure where edges from a given node are what are being chunked onto the stack and I plan to limit the edges in configuration or automatically to fit on cache lines if necessary). This is so far of an edge case that it's probably off topic but seems to be garnering justifiable interest. This minutia is important because its a ML algorithm that scales (and for my sentimental value of speed). – Bots Fab Feb 02 '21 at 07:33
1

@BotsFab if they fit in the cache, they'll fit just as well as the Vec they already are, it should make no difference at best. The problem with dynamic stack allocation is the sub-par code generation produced by the compiler. – Masklinn Feb 02 '21 at 07:45
@Masklinn I am not familiar with the codegen from dynamic stack allocation, I was hoping it would all be slotted in a chunk that could be quickly popped through relative to the register loads and instructions. Thank you for your insight. – Bots Fab Feb 02 '21 at 07:56
1

TBF I don't know if this has the same issue as VLA, but I do know that one of the reasons VLAs were ultimately stripped outside of the linux kernel (and various other codebases) is they lead to really bad code. – Masklinn Feb 02 '21 at 09:57

How can I load all entries of a Vec of arbitrary length onto the stack?

1 Answers1