8

I am currently evaluating a few of scalable memory allocators, namely nedmalloc and ptmalloc (both built on top of dlmalloc), as a replacement for default malloc / new because of significant contention seen in multithreaded environment. Their published performance seems to be good, however I would like to check what are experiences of other people who have really used them.

  • Were your performance goals satisfied?
  • Did you experience any unexpected or hard to solve issues (like heap corruption)?
  • If you have tried both ptmaalloc and nedmalloc, which of the two would you recommend? Why (ease of use, performance)?
  • Or perhaps you would recommend another scalable allocator (free with a permissible license preferred)?
Suma
  • 33,181
  • 16
  • 123
  • 191
  • By the way have you evaluated the Hoard allocator (http://www.hoard.org)? –  Mar 25 '10 at 09:47
  • 2
    I did not, because its GPL license is not acceptable in this case (and its commercial license seems way too costly to us). – Suma Mar 25 '10 at 10:13
  • Since it is important to me could you please explain why GPL is not acceptable? What makes it unacceptable in your case? –  Mar 25 '10 at 10:31
  • 2
    Releasing our product under GPL license is not an option for this product. – Suma Mar 25 '10 at 10:56

4 Answers4

6

I have implemented NedMalloc into our application and I am quite content with the results. The contention I have seen before was gone, and the allocator was quite easy to plug in, even the general performance was very good, up to the point the overhead of memory allocations is out application is now close to unmesurable.

I did not try the ptmalloc, as I did not find a Windows ready version of it and I lost motivation once NedMalloc worked fine for me.

Besides of the two mentioned, I think it could be also interesting to try TCMalloc - it has some features which sound better then NedMalloc in theory (like very little overhead for small allocations, compared to 4 B header used by NedMalloc), however as it does not seem to have Windows port ready, it might also turn to be not exactly easy.


After a few weeks of using NedMalloc I was forced to abandon it, because its space overhead has proven to be too high for us. What hit us in particular was NedMalloc seems to be reclaiming the memory it is no longer used to the OS in a bad manner, keeping most of it still committed. For now I have replaced it with JEMalloc, which seems to be not that fast (it is still fast, but not as fast as NedMalloc was), but it is very robust in this manner and its scalability is also very good.


And after a few months of using JEMalloc I haved switched to TCMalloc. It took more effort to adapt it for Windows compared to the other ones, but its results (both performance and fragmentation) seem to be the best for us of what I have tested so far.

Suma
  • 33,181
  • 16
  • 123
  • 191
  • Can you elaborate on the changes you've made to TCmalloc? We're having the opposite problem, where TCmalloc on Windows does not return memory to the system properly. (On Linux it uses madvise(MADV_DONTNEED) to return physical memory, but there is no equivalent on Windows.) How have you resolved this issue? – skoy May 13 '13 at 07:22
  • 2
    @skoy You can find sources of all our allocators at http://community.bistudio.com/wiki/ArmA_2:_Custom_Memory_Allocator - the TCMalloc based one is at ftp://downloads.bistudio.com/arma2.com/update/Allocs/TCMalloc_source.7z. You can notice we have changed TCMalloc_SystemAlloc and TCMalloc_SystemRelease quite a lot. Note: We have switched to Intel TBB allocator meanwhile, based on large scale performance and stability tests. – Suma May 13 '13 at 07:58
  • Thank you, that is incredibly helpful! – skoy May 14 '13 at 11:29
  • The FTP server the source code was hosted at seems to be down. Is there an alternative way to obtain the source, or at least the DLLs themselves? (I tried contacting BI support email, but haven't received a response yet.) – skoy May 16 '13 at 09:11
  • 1
    @skoy Download links on the page fixed, the link is now http://downloads.bistudio.com/arma2.com/update/Allocs/TCMalloc_source.7z – Suma May 22 '13 at 12:35
  • Thanks! I also got an email about it from a BI support rep. Thanks again for all the help! – skoy May 24 '13 at 11:11
4

In the past I have needed a very fast method to alloc memory. I found that there wasn't an alloc that was up to the job.

After a couple of days search I came upon boost::pool, which we in our application gave a performance increase of 300x.

We affectivly just call malloc/free on the objects we want to create. Although there is a little setup overhead, with having to malloc a large amount of memory to begin with, but once that is done, this is very fast.

Mumbles
  • 1,654
  • 3
  • 14
  • 16
1

I tried to go your path a while ago when faced with a multi-threaded contention and a severe fragmentation problem. After quite abit of testing I concluded that the benefit of these allocators is negligible in most of the interesting cases I had.

The real solution was to pull my own memory manager which was specialized to the tasks I was doing most often.

shoosh
  • 76,898
  • 55
  • 205
  • 325
1

If you are on Win32 my experience has been that it's hard to beat the regular Windows heap manager provided you enable Low Fragmentation Heap using the HeapSetInformation API. I believe this is now standard on newer versions of Windows. It handles locking using Interlocked* Win32 primitives rather than more simple Mutex/CritSec locking.

Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
  • 1
    It may be hard to beat it in single threaded performance and fragmentation, but unfortunately it is far from scalable to multiple cores. It seems to miss the "Thread Caching" offered by other scalable allocator, which they use to avoid locking in a typical situations completely. – Suma Aug 12 '10 at 14:26
  • Fair enough. If/when you have measured using some of those, please let me know your results compared to LFH here. – Steve Townsend Aug 12 '10 at 16:28