0

In my question profiling: deque is 23% of my runtime i have a problem with 'new' being a large % of my runtime. The problems are

I have to use the new keyword a lot and on many different classes/structs (i have >200 of them and its by design). I use lots of stl objects, iterators and strings. I use strdup and other allocation (or free) functions.

I have one function that is called >2million times. All it did was create stl iterators and it took up >20% of the time (however from what i remember stl is optimized pretty nicely and debug makes it magnitudes slower).

But keeping in mind i need to allocate and free these iterators >2m times along with other functions that are called often. How do i optimize the new and malloc keyword/function? Especially for all these classes/structs and classes/struct i didnt write (stl and others)

Although profiling says i (and stl?) use the new keyword more then anything else.

Community
  • 1
  • 1
  • What version of what compiler are you using? – ildjarn Apr 07 '11 at 03:49
  • "The fastest way of doing something is not doing it at all". Why do you allocate 2 million items? Can some of this be avoided? If you have some specific bottleneck for one type of objects, a pool allocator might help. Trying to write a general allocator that is faster than the one provided by the compiler is, well - optimistic. – Bo Persson Apr 07 '11 at 07:35

2 Answers2

3

Look for opportunities to avoid the allocation/freeing, either by adding your own management layer to recycle memory and objects that have already been allocated, or modifying their allocators. There are plenty of articles on STL Allocators:

I have seen large multimap code go from unusably slow to very fast simply by replacing the default allocator.

holtavolt
  • 4,378
  • 1
  • 26
  • 40
  • Boost has standard-compliant pool allocator that can be plugged right into STL containers: http://www.boost.org/doc/libs/release/libs/pool/doc/interfaces/pool_alloc.html – Emile Cormier Apr 07 '11 at 04:44
  • +1, `new` in the STL only means it is delegating to the allocator which the user specified. I have one program where the check of the allocator's result against zero actually has a measurable impact, though! – Potatoswatter Apr 07 '11 at 04:46
1

You can't make malloc faster. You might be able to make new faster, but I bet you can find ways not to call them so much.

One way to find excess calls to anything is to peruse the code looking for them, but that's slow and error-prone, and they're not always visible.

A simple and foolproof way to find them is to pause the program a few times and look at the stack. Notice you don't really need to measure anything. If something is happening that takes a large fraction of time, that is the probability you will see it on each pause, and the goal is to find it. You do get a rough measurement, but that's only a by-product of finding the problem.

Here's an example where this was done in a series of stages, resulting in large speedup factors.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Actually i am parsing a generated text file. I need new for every key in there. Its 8mb of text... I think there were >1 million keys, not to mention values. Theres a lot of calls. However it turns out i only call malloc 3 times (its a lib, not my own code) and my new is >1Mill along with strdup which is pretty much a malloc. I made that faster by allocating a large chunk at once and removed bookkeeping (because i wont need to delete anything unless i delete everything) -edit- I have >20% speedup from doing this. But my strdup doesnt return char* but my own efficient string obj –  Apr 08 '11 at 20:36
  • @acidzombie24: I'm just guessing because I don't know what you're trying to do, but every piece of information has a life between when it's acquired and when it's needed, and it only needs storage between those times. How is the file generated? How long do you need to store keys? It sounds like the big alloc was smart. What makes your string obj more efficient? Those are the kinds of design questions I would have, but random-pausing will locate the problems with certainty, so you're not optimizing things that don't need it. – Mike Dunlavey Apr 09 '11 at 01:50
  • Actually i did the random pausing as well. It went to a heavy duty lib i used, new and strdup. Now it only goes to that lib. Well... my string obj is `struct MyString{int len; char sz[0]};` and made sure to align the ptr to 4bytes (i may change len to 2 but 4 is fine) I wont get into the lifetime but keeping objects in memory until the app closes IS acceptable since it only runs for a few seconds each time. Also my string is never modified and i check the length on compare. I may drop it and cmp 4 bytes of sz.Theres the problem with what if its 2letters, the 4th byte should never be considered –  Apr 09 '11 at 02:12
  • @acidzombie24: 1) What lib, & it went to the lib a good % of samples, but from where? Every line on the stack is something that could save time. 2) So where does the data go? i.e. if straight into a DB, maybe you could do it as you go along, and use a fixed array, thus cutting down on all the new-ing. 3) If you don't see >1 stack samples stopping in the string compare, it doesn't spend enough time to bother optimizing. If you *do* see it > once, strcmp stops at the first unequal byte, so if you need to speed it up, you might just compare the first byte, rather than length. – Mike Dunlavey Apr 09 '11 at 16:43
  • Actually the strcmp code isnt in yet. I need to write several unit test to parse first. Really i just left a comment to say malloc CAN be made faster. Like 1M string with <5 allocates for that job (as oppose to 1M standard memory allocate) i'll +1 this answer cause its good –  Apr 09 '11 at 20:32