Do you think its a good idea not to deallocate memory for small programs?

Question

Let us say my system has 4GB of RAM, and a certain program only consumes 100 MB of memory throughout its execution, and also runs for a limited time, like only 30 seconds. Don't you think its a good idea (by improving performance) not to deallocate memory at all during the execution of that program? Those 100 MBs would be freed anyway at the termination of that program.

What do you mean by "a good idea"? It would work, but I don't see any benefits of not deallocating, while I do see the bad side of promoting bad habits. — wolfPack88, Jun 03 '14 at 17:01
By "good idea", I mean the program would be faster as it would have no overhead of deallocation. — MetallicPriest, Jun 03 '14 at 17:03
If you want your code to be an unmaintainable, unreusable mess and if you hate your readers, sure. And if you don't have ambitions to grow as a programmer and a human being, of course. — Kerrek SB, Jun 03 '14 at 17:03
It sounds a lot like the old "well I'll use bubble sort because I only have 20 elements" logic... code that today has one purpose may find itself being used differently in the future. Being robust is always a good thing. — FatalError, Jun 03 '14 at 17:03
`Don't you think its a good idea not to deallocate memory` What do you think is "good" about this idea? — PaulMcKenzie, Jun 03 '14 at 17:04
Consider that, in the future, you may want to diagnose actual memory leaks. Having valgrind complain about a ton of "leaks" that are freed anyway might get in the way of you finding the actual problem... — fouric, Jun 03 '14 at 17:04
If you want to avoid the performance hit caused by dynamic memory deallocation, you should probably rather avoid dynamic memory allocation in the first place. — moooeeeep, Jun 03 '14 at 17:05
If you can prove that the "overhead of deallocation" makes your program unacceptably slow, then by all means omit it. :) — dlf, Jun 03 '14 at 17:05
If you consider those milliseconds to make a great deal of difference on a 30-second run, there's probably better places to chase milliseconds. — molbdnilo, Jun 03 '14 at 17:07
"Improve performance" by how much? If it runs for "a limited time" can you even tell? Assuming you cut, say 5 milliseconds off the run time. How many times will you have to run it to make up for the time you've spent asking this question and others have spent reading it? — Dale Wilson, Jun 03 '14 at 17:12
Primarily opinion based, indeed. As the answers show. I don't share the majority opinion, and I think there is an interesting discussion to be had here, but apparently SO is not the place for such a discussion. So voting to close. Sorry. — rici, Jun 03 '14 at 17:39
@rici This isn't opinion based look at my answer. Note all the numbers and facts — Steve Cox, Jun 03 '14 at 17:48
@SteveCox: You're only testing one use case. Try it with a frequency counter which accumulates a hash-table with, say, a million unique words and a couple of million counted words. But as I said in my other comment, this is not the forum for such discussions. — rici, Jun 03 '14 at 18:06
@rici why not, calling this opinion based is such a cop out. its clearly not. we have all the tools to test whether or not this is objectively performant. — Steve Cox, Jun 03 '14 at 18:11
@SteveCox: Indeed, but it's heavily dependent on the particular use case. (Look at the memory pool strategy used in Apache as a more elaborate example.) Choosing and instrumenting a use case which demonstrates your preformed opinion is simply a way of hiding the fact that it is a preformed opinion. And from here, I will not respond again. — rici, Jun 03 '14 at 18:14
@rici The scenarios you are describing are not the scenario from the question. The only scenario he is asking about is the one where there is memory that can be deallocated (will never be used again by the program) and one chooses not to deallocate it. This indicates a direct relationship between a deallocation and unwanted cache residency which my use case tests and compares. The fact that this *may* not be the case for long running scenarios where data must be persistently stored and retrieved is completely irrelevant — Steve Cox, Jun 03 '14 at 18:28
`Choosing and instrumenting a use case which demonstrates your preformed opinion is simply a way of hiding the fact that it is a preformed opinion.` Science isn't about not having a 'preformed opinion. You have hypotheses and test them. If you don't already guess that something's true before you start testing, there's no point to testing; you can't interpret the results. He guessed there would be a memory advantage to deallocation, _but_ unlike you he went ahead and tested his conjecture using an example which doesn't have any immediately demonstrable flaws. Run your own tests. Get back to us. — Parthian Shot, Jun 03 '14 at 18:41
@rici TL;DR? It's not a matter of opinion if it can be empirically tested. — Parthian Shot, Jun 03 '14 at 18:42
@ParthianShot: http://ideone.com/y3CcGA With the `exit(0)` commented out, 10 runs, 2.206 - 2.291 seconds. With `exit(0)` uncommented, 10 runs, 2.075 - 2.167 seconds. (compiled on my laptop with -O3 -march=native) — rici, Jun 03 '14 at 19:00
@rici That use case assumes that all of the allocated memory must be allocated for the duration of the program, thereby allowing for no memory savings (every cache line is still a cache miss, if swapping must occur it will occur in both cases). In such a case, there is no advantage to explicitly deallocating memory because the moment after you deallocate the system deallocates, so of course it would be slower because it just burns deallocation cycles at the end. In your use case, is it possible to deallocate / reallocate any of the memory before the program must end? — Parthian Shot, Jun 03 '14 at 19:21
@rici It's just, as a use case, that seems pretty rare. However, if your use case doesn't allow you to deallocate until the end, anyway, it is sort of a moot point and explicitly deallocating is less efficient. — Parthian Shot, Jun 03 '14 at 19:23
@ParthianShot: Precisely. The only correct answer to the question in the OP is "it depends": depends on how early you can deallocate the allocated memory; depends on how much work it is to keep track of the allocations; etc. My example and Steve's example come from opposite ends of the spectrum of the first question: in his, the memory is immediately unneeded, and in mine it is needed to the end. His shows about a 16% slowdown, and mine an equivalent speedup. Draw whatever conclusions you feel justified. — rici, Jun 03 '14 at 19:24
@ParthianShot: Again, last comment: both my use case and Steve's, in their precise form, are rare. But where does the common use case fall? Do you have an objective measurable experiment which will answer that question? Certainly, large hashmaps are common in the sort of work I do, but that may not be typical. — rici, Jun 03 '14 at 19:26
@rici Well, yes, it does depend. But this site aims to get the correct answer for the majority of people. If memory reuse is possible in a program relatively early 90% of the time, then 90% of the time you should deallocate, so 90% giving the advice "deallocate" is justified. — Parthian Shot, Jun 03 '14 at 19:26
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/55017/discussion-between-rici-and-parthian-shot). — rici, Jun 03 '14 at 19:27

Steve Cox · Answer 1 · 2014-06-03T17:53:10.557

The most important part of deallocating memory in a short running program is fascilitating memory reuse. Your processor has a limited cache, probably only a few MBs, and if you continuously allocate more memory you can never take advantage of that.

If however you deallocate some memory, and then reallocate and reuse it, there's a good chance that on reuse the memory will already be "hot" and resident in cache. This means you won't incur the penalty of a cache miss and your program will run faster.

Of course you are trading off the deallocation/reallocation cost for cache misses, but continuously allocating more and more memory is guaranteed to incur lowest level cache misses. Each of these cache misses cost ~100 instructions on a modern processor, way less than the cost of an allocation/deallocation.

Edit

So I made a little test program to test this.

#include <stdio.h>
#include <stdlib.h>

process1(){
    char * ptr = malloc(1 << 20);
    int i = 0;
    while (i < (1<<20)) ptr[i++] = rand();
    free(ptr);
}

process2(){
    char * ptr = malloc(1 << 20);
    int i = 0;
    while (i < (1<<20)) ptr[i++] = rand();
}


main(){
    int i = 100;
    while(i--) process1();
    i = 100;
    while(i--) process2();
}

Both processes chew through 100 MB of data, the first one deallocates it as it goes, the second does not. I profiled this with valgrind's cachegrind tool. Here are the results

fn=process1
0 943720200 2 2 419430900 100 0 209715800 1638400 32767
fn=process2
0 943719900 0 0 419430800 100 0 209715700 1638400 1605784

Okay not that exciting, i'll translate. By avoiding the free you saved less than .1% of instructions cycles, and you incurred about a million and a half LL cache misses. If we use the standard cycle estimation formula CEst = lr + 10 Llm + 100 LLm thats about a 16% degradation in performance from avoiding the free in the function alone.

I reran in callgrind to get the full story (i'm not gonna post the detailed results because they are way more complicated than the cache grind ones) but when we include the full call to free the result is the same, a 16% more cycles are used when you don't call free.

Feel free to do this same test on your own program, but in this case the results are clear. The cost of managing dynamic memory is truly trivial compared to the cost of continuously refreshing the cache with new memory.

Also, I didn't mention this, but the direct comparison that should be made is the cost of the free to the cost of the cache misses. The frees cost in total 38930 cycles, the cache misses 157332000, you saved 39 thousand cycles, and paid for it with 150 million.

Well, if you don't reuse part of the memory, it would be gone from the cache anyway, so I don't see the problem there. But your second point is interesting, reallocating over the same memory could indeed increase cache hits and therefore reduce cache misses. — MetallicPriest, Jun 03 '14 at 17:07
@MetallicPriest Not reusing memory is bad, because it cost cycles to eject that memory from the cache and replace it with other memory. Thats the whole problem. — Steve Cox, Jun 03 '14 at 17:08
These are big memory allocations. Would be interesting to check this with small memory allocations. — MetallicPriest, Jun 03 '14 at 17:58
@MetallicPriest small memory allocations means you'll probably chew away at first level cache misses too. but like i said test your own code. If the deallocations are really that costly, consider using a different allocator *before* you consider ignoring deallocation altogether — Steve Cox, Jun 03 '14 at 18:06

score 6 · Answer 2 · edited May 23 '17 at 11:44

Don't you think its a good idea not to deallocate memory at all during the execution of that program?

Cutting corners is not a "good idea". I can think of plenty of reasons to deallocate memory; I can't think of quite so many to leave unused memory allocated. If you don't want to deal with memory issues, don't write in a language that doesn't have automatic garbage collection- or, since you're writing in C++, use smart pointers.

only consumes 100 MB of memory

That's the most egregious misuse of the word "only" I have ever seen. 100 MB is a serious chunk of change. That's ~1/40th of the memory you're using, and if even 40 other programs running on the system were programmed with the same perspective, at best you'd have the system come to a grinding halt because of swap delays, and at worst you'd have a complete system crash. Usually, there are more than 40 applications running on any given (modern) machine while it's turned on.

also runs for a limited time, like only 30 seconds

Again, 30 seconds is a long time for a computer. But, to clarify, do you mean 30 seconds ever, or 30 seconds every day? Because if it's a one-off and you're low on time, maybe that's an acceptable loss, but it certainly isn't good practice.

TL;DR?

If you have more important, immediate considerations (e.g. a product deadline is approaching) and you've got not other reasonable choice, do it. Otherwise, you've got no excuse. Hogging memory is never something you should feel good about; it can seriously slow down processes that really do need that memory. Yes, the O/S cleans up resources after the program ends. No, that does not give you carte blanche to write programs which abuse their resources.

score 2 · Answer 3 · answered Jun 03 '14 at 17:02

2

No, it is not good programming style. Virtually all modern OS's will cleanup the memory afterwards, but it will remain a leak for as long as the program runs.

answered Jun 03 '14 at 17:02

Kris Morness

574
5
12

But could it improve the performance of the application by eliminating the overhead of deallocation? – MetallicPriest Jun 03 '14 at 17:03
@MetallicPriest depends, if you can defer the decallocation until the program ends, it should be the same. As I see it, either you bulk-cleanup or the OS does it :) – Morten Jensen Jun 03 '14 at 17:09
I really don't think it's a great idea to ever trust something out of your control to do something for you. – Kris Morness Jun 03 '14 at 17:35
@KrisMorness what? you don't trust the OS to manage memory for you? – Steve Cox Jun 03 '14 at 17:36

score 0 · Answer 4 · answered Jun 03 '14 at 21:01

If you want to avoid overhead from cleanup you can implement an allocator that permits that while still maintaining good practices. For example, allocate a memory pool on program start-up, replace the global operator new with a version that takes memory from the pool, and replace global operator delete with a no-op. Then free the memory pool before program exit. You could also use more find grained pools to cut down on the deallocation overhead but still allow the program to run without continuously growing memory usage.

Another alternative is to simply use a faster implementation of new/delete. The built-in versions are traditionally slow, but they don't have to be.

Do you think its a good idea not to deallocate memory for small programs?

4 Answers4

TL;DR?