Where should I allocate this memory?

Question

for( int i = 0; i < lines; i++ ) {
    std::unique_ptr<BYTE[]> pLine( new BYTE[lineSize] );
    //Do stuff
}

Now, pLine is declared inside the loop because it's only used in the loop body. However, wouldn't allocating only once outside the loop reduce the amount of allocations performed (avoiding memory fragmentation)?

std::unique_ptr<BYTE[]> pLine( new BYTE[lineSize] );
for( int i = 0; i < lines; i++ ) {
    //Do stuff
}

I could believe the compiler would be able to optimize the first version easily if it knew lineSize stays the same throughout iterations; but it does change throughout function calls, so I can't make it a constant.

I also think micro optimizations like this should be avoided until a performance problem is detected, so I guess I would stick with the first version. What do you guys think?

If the value of `lineSize` changes throughout the loop, how would you allocate `pLine` only once outside (unless you know the size of the largest line)? — Frédéric Hamidi, Jun 08 '12 at 17:19
@FrédéricHamidi: I assume he meant that what it points to changes, not the pointer itself (could be wrong). — Ed S., Jun 08 '12 at 17:20
It doesn't change throughout the loop; it changes throughout function calls (this is inside a function). For each function call, it remains the same. — dario_ramos, Jun 08 '12 at 17:20

Nawaz · Answer 1 · 2012-06-08T17:27:38.707

3

First of all, I think you're using wrong tool for that. What you should use is std::vector as:

std::vector<BYTE>  data(count); //allocate and initialize

It allocates the memory and initializes all the elements. If you want it to allocate only, without any initialization, then write this:

std::vector<BYTE>  data;
data.reserve(count); //allocate only

Now, where you should declare this? It depends on it's usage. But try to reduce the scope of the variable: if it is needed only in the for loop, then declare it inside the loop itself.

edited Jun 08 '12 at 17:27

answered Jun 08 '12 at 17:20

Nawaz

353,942
115
666
851

It does not differ, I suppose. In some cases pointer will be even better, than vector, which takes too much overhead to work with. – Forgottn Jun 08 '12 at 17:21
3

@Forgottn: vector takes *too much overhead*? any proof for that? – Nawaz Jun 08 '12 at 17:25
If you're just filling an known-size array of bytes with data, pointer will be just perfect, while vector will bring all the checking stuff with it, and will also initialize memory if you're just allocating it. Plus vector operates with at least two pointers, while simple array will need only one. – Forgottn Jun 08 '12 at 17:30
@Forgottn: What you're talking about is not true. (btw, what do you mean by *"known-size"* in this context?) – Nawaz Jun 08 '12 at 17:32
Cm'on don't be naive, look at my answer (it's just in the bottom) for the *simple* `push_back` code. – Forgottn Jun 08 '12 at 17:35
(Added the code). Tell me, what's all this code does there inside a `push_back`? I've just wanted to write one trailing byte! – Forgottn Jun 08 '12 at 17:38
@Forgottn: So one `if` is *"too much overhead"* in your opinion? Don't you want safe-programming? – Nawaz Jun 08 '12 at 17:38
2

@Forgottn: "Checking stuff"? Checked iterators and bounds checking should not be on by default, and you can always disable them if they are. You're making assumptions about performance based on missing or incorrect information. – Ed S. Jun 08 '12 at 17:40
2

@Forgottn: "Naive"? Really? Use the simplest solution (that is not obviously a bad idea) that works well. If it doesn't work well or is not performant then change it. – Ed S. Jun 08 '12 at 17:41
@Nawaz, please see my comment in the branch below (sorry for it). It's about the overhead. Known-size, means that the OP allocates the memory just in the beginning of the loop. And it's size never changes. – Forgottn Jun 08 '12 at 17:55
@EdS., please, see my comment below. It's an answer on this question, too. – Forgottn Jun 08 '12 at 17:56
4

@Forgottn: Your logic is flawed. The cost of a vector is very low. Read: http://stackoverflow.com/a/3664349/14065 – Martin York Jun 08 '12 at 18:04
@LokiAstari, we're talking not about the initialization of the vector. And yes, you will not be able to use that cheat code to initialize it with different values. – Forgottn Jun 08 '12 at 18:07
@Forgottn: you keep talking about `push_back`, and I don't know why. Nobody except you is suggesting altering the original code to use `push_back`. – Mooing Duck Jun 08 '12 at 18:12
@Forgottn: As pointed out in the linked article your logic is flawed and **wrong**. Try timing it and see if you can see **any** performance difference with optimizations turned on. – Martin York Jun 08 '12 at 18:13
@Forgottn: I don't know why your answer was deleted, I do not think it should have been. I think the mod made a bad call there. :( – Mooing Duck Jun 08 '12 at 18:22
@MooingDuck, nah, I don't care. Thanks for the tips, although. – Forgottn Jun 08 '12 at 18:58
@LokiAstari, you can see a profiling code here: http://ideone.com/wIfpf. Input size: 100000. Dumping the results here: do_test_vector1 (preallocate, operator []): 16 seconds total, 0.00016 average; do_test_vector2 (reserve, push_back): 64 seconds total, 0.00064 average; do_test_array (allocation, operator []): 6 seconds total, 0.00006 average. Build Visual C++ 2010 SP1, default console project with release build (optimization /O2: speed). This means, that array is at least 3 times faster on this operation. – Forgottn Jun 08 '12 at 19:12
@MooingDuck, please, see the test run results. Neither case with vector is optimal. – Forgottn Jun 08 '12 at 19:15
Using `g++ -std=0x -O3 speed.cpp` (and fixing for non standards compliance). I get (inc clock cycles) `V(12390000) V2(81540000) V3(11510000)`. Thus the vector(V1) is practically identical to unique_ptr(v3). The outlier is push_back()(v2) but since you are the only one talking about that its not relevant. I believe you are suffering from `Bad test mythology` where you forgot to warm up your cache and thus the ordering of the test is significant to the timing results. Second run with the first and last test swapped around `V(11610000) V2(75780000) V3(12310000)` Even closer this time. – Martin York Jun 08 '12 at 19:41
@LokiAstari, three different runs with three different tests and *cold* cahce gave me the almost the same results: 15 seconds for v1, 65 seconds for v2 and 5 seconds for pointer. Why do you think, that somebody will *warm up* the cache in the real programs? Regarding the `push_back`, do you know how to replace it if we're talking about the `reserve` call, instead of `resize`? – Forgottn Jun 08 '12 at 19:47
@Forgottn: `Why do you think, that somebody will warm up the cache`: No. But I think your timings are skewed and thus incorrect because of them (there is no way that vector is 3 times slower it would not be used if so and performance of vector is nearly identical to a built-in array I am sure unique-ptr is not 3 time faster than that). This argument has been done to death before on SO. It has been shown many times that your assumptions are **in-correct** (as I have done above). If you want to get beaten down more please continue on chat where I am sure the user will put you straight `again`. – Martin York Jun 08 '12 at 19:54
@LokiAstari. Ok, which problems are brought by three different runs of three different programs? The test result hadn't changed. Pointer is still three times faster, than vector, because it does not fill the memory with data and does not perform any runtime checks. – Forgottn Jun 08 '12 at 19:57
@Forgottn: Then you are doing something else wrong. Since vector has performance characteristics similar to a built-in array (when you turn the optimizer on) I doubt that the unique-ptr is three time faster than a built-in array. PS. There are no runtime-checks done when accessing a vector (just like an array). And const initialization of memory is not expensive in comparison to the memory management routines. – Martin York Jun 08 '12 at 19:59
@LokiAstari, access yes, no overheads, but do you hear me? Vector does fill every @#@$ byte in the memory when constructing itself. It's an overhead. And it brings delays, depending on memory size and *cache temperature*. – Forgottn Jun 08 '12 at 20:05
@Forgottn: The cost of init is not linear to size (you don't think there is a tight loop running the length of the array do you!!!!! ;-). The cost of memory management is several orders of magnitude larger than the cost of initialization. – Martin York Jun 08 '12 at 20:08

score 2 · Accepted Answer · answered Jun 08 '12 at 17:58

The question is really what is in the //do Stuff

for( int i = 0; i < lines; i++ ) {
    std::unique_ptr<BYTE[]> pLine( new BYTE[lineSize] );
    //Do stuff
}

I would assume you are allocating the memory inside the loop to unique_ptr because somewhere in the //Do stuff section you are passing ownership of that memory to another object.

This if you try and do this:

std::unique_ptr<BYTE[]> pLine( new BYTE[lineSize] );
for( int i = 0; i < lines; i++ ) {
    //Do stuff
}

The first time around the loop ownership is transfered and now pLine contains a NULL pointer for the second a subsequent iteration.

If you are not transfering ownership then you are probably using the completely wrong method for memory management and a vector is probably a better idea.

for( int i = 0; i < lines; i++ ) {
    std::vector<BYTE> pLine(lineSize);
    //Do stuff
}

The advantage of putting it inside the loop it is re-initialized so all members are zero on every iteration of the loop. If the //Do Stuff is affected by the state of the pLine then it may be incorrect to move it outside the loop (without also doing some more work to make sure the state of pLine is correct after each iteration.

Ownership is not transferred; I just perform some calculations on the byte values before writing the line to a file. So I guess a std::vector is better. On the other hand, say I allocate inside the loop. Think of a long running app. Wouldn't memory fragmentation start being an issue in that case? — dario_ramos, Jun 08 '12 at 18:34
No. The C++ memory allocation routines were written for this exact type of usage. It is expect lots of short lived small objects and has been written with this in mind. Since the memory is released and then immediately reallocated you will probably get the same piece of memory back. I have also seen compilers that understand how the allocator works and will take this exact situation and optimize it so that very little work is done inside the allocator. — Martin York, Jun 08 '12 at 18:52

score 0 · Answer 3 · answered Jun 08 '12 at 17:21

0

I also think micro optimizations like this should be avoided until a performance problem is detected

This is not a "micro optimization", it's common sense really. Even if the compiler can lift the variable outside of the loop (did you check to see if it does?), why ask it to if it's not needed?

Micro optimizations make your code faster/more memory efficient at the cost of clarity. This does not fit that description.

On an unrelated note, are you sure you actually want a pointer to a BYTE[]? Seems odd...

answered Jun 08 '12 at 17:21

Ed S.

122,712
22
185
265

I read [here](http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=401) that I should use `[]` so that `delete[]` is called. – dario_ramos Jun 08 '12 at 17:25
1

@dario_ramos: use a `vector` instead. – Mooing Duck Jun 08 '12 at 17:39
2

@dario_ramos: Why not simply use a `std::vector`? – Ed S. Jun 08 '12 at 17:39
One `if`, and `_Reserve`, and `_OrphanRange` which are also not one-lined functions, and the `_Cons_val` at last and all for putting one symbol into the preallocated memory? It isn't funny. – Forgottn Jun 08 '12 at 17:43
1

@Forgottn: Ok, please tell me the performance difference between the two in the OP's situation. I'll be waiting. I'm not saying your wrong, I'm saying you don't know, you're just guessing. We don't even know what the OP's code is doing! – Ed S. Jun 08 '12 at 17:44
Not very good at english abbreviations. Which http://en.wikipedia.org/wiki/OP do you mean? – Forgottn Jun 08 '12 at 17:45
@Forgottn: Sorry, "OP" meaning "original poster", i.e., dario_ramos who posed the question. There are plenty of cases in which a `vector` is an inefficient solution, we just don't know if this is one of them. – Ed S. Jun 08 '12 at 17:46
Yep, understod already. (And I missed the original branch of the comments, sorry). So, yes, we don't know the OP's situation and, yes, pointer is better if we're talking of filling the fixed-size array of bytes with data. In other cases safety, brought by the vector is more convinient. – Forgottn Jun 08 '12 at 17:53
Its is a macro optimization. You are also changing behavior. The state of the memory may be altered each time around the loop. Re-initialiazing it at the beginning of the loop makes sure that the memory is in a clean state for `//Do stuff` moving this outside the loop may affect the algorithm. – Martin York Jun 08 '12 at 17:59
@LokiAstari: Yes, it is semantically different, but the OP has presented the problem as if it will not matter, and on top of that he didn't post the code that manipulates the buffer, so I am inclined to assume he is correct. – Ed S. Jun 08 '12 at 18:03

Where should I allocate this memory?

3 Answers3