3

I'm writing HFT trading software. I do care about every single microsecond. Now it written on C# but i will migrate to C++ soon.

Let's consider such code

// Original
class Foo {
....

    // method is called from one thread only so no need to be thread-safe
    public void FrequentlyCalledMethod() {
        var actions = new List<Action>();
        for (int i = 0; i < 10; i++) {
            actions.Add(new Action(....));
        }
        // use actions, synchronous
        executor.Execute(actions);
        // now actions can be deleted
    }

I guess that ultra-low latency software should not use "new" keyword too much, so I moved actions to be a field:

// Version 1
class Foo {
....

    private List<Action> actions = new List<Action>();

    // method is called from one thread only so no need to be thread-safe
    public void FrequentlyCalledMethod() {
        actions.Clear()
        for (int i = 0; i < 10; i++) {
            actions.Add(new Action { type = ActionType.AddOrder; price = 100 + i; });
        }
        // use actions, synchronous
        executor.Execute(actions);
        // now actions can be deleted
    }

And probably I should try to avoid "new" keyword at all? I can use some "pool" of pre-allocated objects:

// Version 2
class Foo {
....

    private List<Action> actions = new List<Action>();
    private Action[] actionPool = new Action[10];

    // method is called from one thread only so no need to be thread-safe
    public void FrequentlyCalledMethod() {
        actions.Clear()
        for (int i = 0; i < 10; i++) {
            var action = actionsPool[i];
            action.type = ActionType.AddOrder;
            action.price = 100 + i;
            actions.Add(action);
        }
        // use actions, synchronous
        executor.Execute(actions);
        // now actions can be deleted
    }
  • How far should I go?
  • How important to avoid new?
  • Will I win anything while using preallocated object which I only need to configure? (set type and price in example above)

Please note that this is ultra-low latency so let's assume that performance is preferred against readability maintainability etc. etc.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
Oleg Vazhnev
  • 23,239
  • 54
  • 171
  • 305
  • 9
    If it's that important, who are you gonna trust: People on the internet, or scientific benchmarks? –  Jan 09 '13 at 17:37
  • 3
    @delnan i do trust people on stackoverflow :) – Oleg Vazhnev Jan 09 '13 at 17:38
  • 6
    I would try both and make measurements. – dmaij Jan 09 '13 at 17:39
  • Memory allocation can be slow, but it doesn't have to be. – Chad Jan 09 '13 at 17:40
  • 1
    Please don't trust SO for this kind of thing haha but in our financial application, we've essentially pre-allocated huge chunks of memory, written a custom "memory manager", and "allocated" our new objects in this space. – im so confused Jan 09 '13 at 17:40
  • The reason why is that memory fragmentation and looking for a pool the size you requested become slower on creation of millions of objects. therefore, you should use program logic to pre-structure your memory – im so confused Jan 09 '13 at 17:41
  • I agree with AK4749, storing everything in memory instead of allocating is probably faster. – dmaij Jan 09 '13 at 17:41
  • Rewrite your application in C++, then profile it, then ask for suggestions. – Chad Jan 09 '13 at 17:43
  • @Chad the less objects i allocate dinamically the easy application to rewrite to c++ I guess so solving this problem is one of the steps to rewrite application to c++ – Oleg Vazhnev Jan 09 '13 at 17:44
  • @dmaij when I store pool of objects as field of method isn't I "storing everything in memory"? so my "Version 2" would be the fastest ? – Oleg Vazhnev Jan 09 '13 at 17:45
  • @javapowered, if you could store all your actions in memory, you would only allocate them once. Do take care of how you access them, ordered, hashed, btree etc since this is your next challenge. But, again, try both, measure and see which is faster (academic approach). calling clear each time also a performance penalty. – dmaij Jan 09 '13 at 17:49
  • If you care about performance, make or find a custom allocator object. Boost and MSVC both have many, and there's many more around the internet for free. – Mooing Duck Jan 09 '13 at 17:50
  • @MooingDuck will custom allocator be faster than pre-allocated field? – Oleg Vazhnev Jan 09 '13 at 17:53
  • I have removed the C++ tag. Except the comment that you might want to port this to C++ later on, the question is not C++ related, and in this particular case it does not make any sense to try and answer for both languages as memory management is very different. Also note that `new` has different meanings in C# depending on the characteristics of the type (value type vs. reference type), and that information is missing for the `Action` type. – David Rodríguez - dribeas Jan 09 '13 at 17:54
  • @javapowered: A custom allocator should be the same speed as a pre-allocated field, except easier to use correctly, works with standard containers, etc, etc, etc. – Mooing Duck Jan 09 '13 at 17:57
  • @MooingDuck thanks so on C# i probably should just use preallocated fields where possible but when migrated to c++ I can replace these fields with boost or another allocator. – Oleg Vazhnev Jan 09 '13 at 18:02
  • 1
    Keep in mind that in C# allocating a new object is *very* cheap. The GC has a pointer in memory to the free section of the heap, it moves the pointer up by the size of what you're allocating, and then runs the constructors. It's the garbage collection and de-fragmenting that makes heap objects more expensive than using stack memory, but that time isn't spent when the object is allocated, it's spent later. If this exact moment is very time sensitive, but there will be idle time at some point in the future (to run collections), then you may not have an issue. In C++ the exact opposite is true. – Servy Jan 09 '13 at 18:59

3 Answers3

4

In C++ you don't need new to create an object that has limited scope.

void FrequentlyCalledMethod() 
{
    std::vector<Action> actions;
    actions.reserve( 10 );
    for (int i = 0; i < 10; i++) 
    {
        actions.push_back( Action(....) );
    }
    // use actions, synchronous
    executor.Execute(actions);
    // now actions can be deleted
}

If Action is a base class and the actual types you have are of a derived class, you will need a pointer or smart pointer and new here. But no need if Action is a concrete type and all the elements will be of this type, and if this type is default-constructible, copyable and assignable.

In general though, it is highly unlikely that your performance benefits will come from not using new. It is just good practice here in C++ to use local function scope when that is the scope of your object. This is because in C++ you have to take more care of resource management, and that is done with a technique known as "RAII" - which essentially means taking care of how a resource will be deleted (through a destructor of an object) at the point of allocation.

High performance is more likely to come about through:

  • proper use of algorithms
  • proper parallel-processing and synchronisation techniques
  • effective caching and lazy evaluation.
CashCow
  • 30,981
  • 5
  • 61
  • 92
  • what do you mean by "effective caching"? are you talking about processor cache? I almost do not use parallel processing and synchronization. I almost use single thread for the 80% of the job, as I want to have job done in less than 100 microseconds, and probaly much less than 100 microseconds. will it be faster to use "concrete type" or use "preallocated object that only need to be configured"? – Oleg Vazhnev Jan 09 '13 at 18:00
  • put it this way, if it turns out using new or not (on hot code), has little impact on your performance, then your program is too slow. –  Jan 09 '13 at 18:07
  • both your own caching, and also ensuring that some operations ensure proper processor caching. We've had topics before about different collection sizes giving stark contrast in speed. – CashCow Jan 10 '13 at 15:36
2

As much as I detest HFT, I'm going to tell you how to get maximum performance out of each thread on a given piece of iron.

Here's an explanation of an example where a program as originally written was made 730 times faster.

You do it in stages. At each stage, you find something that takes a good percentage of time, and you fix it. The keyword is find, as opposed to guess. Too many people just eyeball the code, and fix what they think will help, and often but not always it does help, some. That's guesswork. To get real speedup, you need to find all the problems, not just the few you can guess.

If your program is doing new, then chances are at some point that will be what you need to fix. But it's not the only thing.

Here's the theory behind it.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

For high-performance trading engines at good HFT shops, avoiding new/malloc in C++ code is a basic.

Nathan Doromal
  • 3,437
  • 2
  • 24
  • 25