Fastest way to allocate memory for this task

Question

I am working on a C++/OpenGL 3D mesh modeling software. Basically an extremely primitive Blender.

A Mesh contains a cache of vertices (EditCache<Vertex> class below), whose vertex data is stored in a single chunk to send to glBufferData. That is the vertices vector below. Then associated with each vertex, is a dynamic array of edge indices and face indices which refer to the IDs of edges and faces which contain that vertex.

template<typename T>
class EditCache {
private:
    std::vector<T> vertices;
    std::vector<std::vector<int>> edge_indices;
    std::vector<std::vector<int>> face_indices;

public:
int addVertex(T vertex) {
        vertices.push_back(vertex);
        edge_indices.push_back(std::vector<int>());
        face_indices.push_back(std::vector<int>());
        return vertices.size() - 1;
    }

void pairEdgeIndex(int vertex_index, int edge_index) {
        edge_indices[point_index].push_back(edge_index);
    }

void pairFaceIndex(int vertex_index, int face_index) {
        face_indices[point_index].push_back(face_index);
    }

...

};

When a new edge or face is created in the Mesh class out of existing vertices, the ID of the new edge/face is pushed back to the array in edge_indices/face_indices associated with each vertex that is in the edge/face. When a new vertex is created, it is pushed back onto the vertices array, as well as pushing back two new empty vectors in at the new vertex's spot to hold the list of edges and faces that the new vertex is contained in.

Roughly the data structure looks like this (not how it is stored in memory).

- vertex_0 | { edge_0, ..., edge_n } | { face_0, ..., face_m }
- vertex_1 | { edge_0, ..., edge_n } | { face_0, ..., face_m }
- vertex_2 | { edge_0, ..., edge_n } | { face_0, ..., face_m }
- ...

where each edge or face list is variable length. In memory, each column (vertex, edge-index, and face-index) are three separate arrays.

When creating new quads, faces or bigger structures with many new vertices, I will have to perform multiple std::vector.push_back() many times to create new vertices and to pair each vertex to each of its new edges and faces. And allocating memory one by one is really slow. The catches where I think I have room to optimize are:

1.) The arrays for the edge and face IDs do not have to be continuous, they can be linked lists.

2.) I will know how many points I will be adding at the beginning of the task.

So in a perfect world, I would allocate a single chunk of the size I need and then append each of the data points to each linked list one by one in that new space. How can I accomplish this task? Naturally, I don't like having exposed new and delete operations as well as having exposed raw pointers to free space. Is there a way I can do this with std::unique_ptr or std::shared_ptr? How can I allocate the memory in a safe way?

What is your evidence that `std::vector::push_back` is "allocating memory one by one [and it is] really slow"? `push_back` is required by the C++ standard to be much smarter than that. — Sam Varshavchik, Aug 28 '23 at 22:27
@SamVarshavchik Really? even if I have 20 separate push back calls to 20 different vectors? — BENG, Aug 28 '23 at 22:30
When using `push_back`, if you already know how many elements you will be adding, it's always a good idea to call `reserve` to allocate enough memory up front. — paddy, Aug 28 '23 at 22:30
@SamVarshavchik So using reserve will allocate the space and then I can use push back one by one, and it will slot it into that new space — BENG, Aug 28 '23 at 22:32
@SamVarshavchik the issue with reserve I think, is that I will need to call reserve one by one for each distinct vector, because I am not push_back-ing a chunk to a single vector, I am doing it one by one to many. And so effectively calling reserve one by one is exactly the issue I'm having in the first place — BENG, Aug 28 '23 at 22:35
[std::vector and std::string reallocation strategy](https://stackoverflow.com/questions/21352323/stdvector-and-stdstring-reallocation-strategy) and [Initializing the size of a C++ vector](https://stackoverflow.com/questions/25108854/initializing-the-size-of-a-c-vector) are worth reading if you're not sure how a vector allocates space when it is full. There are no additional memory leak issues with `resize` or `reserve`. If the vector is destroyed the memory is freed. — Retired Ninja, Aug 28 '23 at 22:42
Yeah, I see that there are no memory leaks, I just read the docs. I'll take a look at this. — BENG, Aug 28 '23 at 22:45
I still think that this will not be fast unless there is a way to run `std::vector::reserve` to multiple different vectors all at once. Does that make sense? — BENG, Aug 28 '23 at 22:46
It's not clear why `face_indices` or `edge_indices` are vector-of-vector. This seems way overkill. For ordinary triangular meshes, each face comprises _exactly_ three vertices, and can be represented by a struct. You mention quads and larger structures, so I guess your mesh representation is N-gons. You can _still_ avoid the vector-of-vector in such cases, by making `faces` a vector that stores the first index and number of vertices for the face. Then maintain a single vector `face_vertices` that stores contiguous chunks. If you add, remove or resize a face, you flag the mesh for re-indexing. — paddy, Aug 28 '23 at 22:49
@BENG Nothing stops you from reserving all of your vectors one at a time before doing your work with them. Reserving one time is likely not going to be the kind of bottleneck you are clearly thinking it is. — Remy Lebeau, Aug 28 '23 at 22:49
Each `vector` needs to be operated on separately. No way you can safely escape that. Don't try to make one and then `memcpy` it because `vector`, along with the vast majority of Standard Library classes, is too complicated to survive a bit-to-bit copy. — user4581301, Aug 28 '23 at 22:49
Thats why i was thinking of changing the data type. That was the whole point of the question. — BENG, Aug 28 '23 at 22:50
Have you done any profiling to figure out whether vector additions are actually a bottleneck? Switching to some complicated and possibly buggy custom allocation scheme only might maybe make sense if it will really speed things up significantly. Premature optimization is a bad idea. — Dave S, Aug 28 '23 at 22:52
@paddy I dont think that works, because a single point may have an arbitrary amount of edges coming out of it. That is why it must be a dynamic array. Each time we create an edge from an existing point, we append that edge id to the array — BENG, Aug 28 '23 at 22:56
You could for example do a test where you skip all allocations and just write and re-write all the data to a single vertex with 1 edges and 1 face. vertex [0[ face[0] edge[0], then compare it to the real version. Then decide if the difference is really worth optimizing despite the extra effort and risk of bugs — Dave S, Aug 28 '23 at 22:56
@DaveS Yeah, in the end this all might be a moot point. Because creating vertices and faces etc. is a user prompted operation. Which will be happening slowly compared to the actual frames being rendered. The user will not use a tool which creates objects more than one second close to each other, most likely. Which in the long run, there have already been 60 frames rendered in drawing the shapes. — BENG, Aug 28 '23 at 22:59
Even then, one second is very quick to be creating objects. The whole idea of using push back so many times just felt yucky to me so I wanted to see if there is a smarter way. — BENG, Aug 28 '23 at 23:00
Look at it as being safe -- no memory leaks, no buffer overruns, no hard to track down logic flaws in your own custom allocation scheme. I started out with C and K&R C++ so I still tend to write some C-like code, but the std:: library really is better for most collections and strings. — Dave S, Aug 28 '23 at 23:02
Start with the stupidest and simplest possible solution that has a reasonable chance of reaching the requirements. Implement it. Then test it and prove it works. Then profile and find the bottlenecks. Complicate the bottlenecks as little as possible to reach the necessary performance targets You may have a problem with memory allocation, but until you've tested and found that you have a problem with memory allocation, stick to dumb, ol' `vector`. If it turns out that `vector` isn't good enough you still need the baseline measurements to prove whatever you replace it with really is faster. — user4581301, Aug 28 '23 at 23:07
@DaveS In your experience is 24 push_back calls extremely slow, or is it passible? — BENG, Aug 28 '23 at 23:16
...and consider long-term maintenance. A year later you'll still understand how vector works. If you replace it with some super-clever custom allocation that's 20% faster how hard will it be to understand and update in a year or two? We have code at work that's 15-20 years old. Super clever is often bad unless you really really need it. — Dave S, Aug 28 '23 at 23:16
> "is 24 push_back calls extremely slow, " --- less than 100 of most anything done in-memory to a collection is too fast to notice. Even 1,000+ is usually too fast to care unless you're doing something O(N^2) with it like linear search of an unsorted list. Even then if you only do it once you might not care -- 1/100 second vs. 1/2 second doesn't matter for a lot of things — Dave S, Aug 28 '23 at 23:19
For a better perf improvement, change this `int addVertex(T vertex) {`. To `int addVertex(const T& vertex) {` — selbie, Aug 28 '23 at 23:55
Are you testing optimized / release mode? I ask this because msvc adds a lot of overhead for heap / memory corruption testing in debug mode and that can cause a massive slowdown. I have seen debug mode take 100 times longer to execute than release mode for the same code and data. — drescherjm, Aug 29 '23 at 00:02

score 2 · Answer 1 · answered Aug 29 '23 at 06:12

So in a perfect world, I would allocate a single chunk of the size I need and then append each of the data points to each linked list one by one in that new space. How can I accomplish this task?

You can reserve space for the vector upfront when you know how much space you need:

 std::vector<int> x;
 x.reserve(100000);

 // then later
 x.push_back(42);  // no allocation happens here

Frankly, I think your other questions are only triggered by expecting things to be more complicated than they really are. You do not need manual new nor delete for this, also no smart pointers, a vector can do that (and it is not unsafe).

Fastest way to allocate memory for this task

1 Answers1