How to optimize heavy map insertion in C++ regarding CPU and memory

Question

I am iterating a map where I need to add elements on that map depending on a condition that an element is not found (it could be any other condition).

My main problem is that with a big scale of updates to be added, the application takes the whole CPU and all the memory.

State Class:

class State {

    int id;

    int timeStamp;

    int state;

}

Method in State:

void State::updateStateIfTimeStampIsHigher(const State& state) {
    if (this->id == state.getId() && state.getTimeStamp() > this->getTimeStamp()) {
        this->timeStamp = state.getTimeStamp();
        this->state = state.getState();
    }
}

Loop Code:

std::map<int, State> data;

const std::map<int, State>& update;

for (auto const& updatePos : update) {
    if (updatePos.first != this->toNodeId) {
        std::map<int, State>::iterator message = data.find(updatePos.first);
        if (message != data.end() && message->first) {
            message->second.updateStateIfTimeStampIsHigher(updatePos.second);
        } else {
            data.insert(std::make_pair(updatePos.first, updatePos.second));
        }
    }
}

Watching KCacheGrind data it looks like the data.insert() line takes most time / memory. I am new to KCacheGrind, but this line seemed to be around 72% of the cost.

Do you have any suggestions on how to improve this?

Have you considered [`std::unordered_map`](http://en.cppreference.com/w/cpp/container/unordered_map) for `data`? You should read this : https://stackoverflow.com/questions/3902644/choosing-between-stdmap-and-stdunordered-map — François Andrieux, Sep 08 '17 at 13:21
1) what Francois suggests 2) how about changing the data and algorithms? For example do you need `int` for key? What is `State`? Is it small enough that the copy constructor is fast, or can it be changed to be small? Is the `int` key "dense" enough to use `std::vector` instead without keys, and just have empty gaps at unused indices? etc, etc... — Ped7g, Sep 08 '17 at 13:26
Since the two maps have the same structure, you could use hinted insertion, since you're guaranteed to insert at the end. That doesn't save on allocations, but on search time. — Kerrek SB, Sep 08 '17 at 13:28
Does the `update` map have to remain unmodified? If you could cannibalise it, you could save on reallocations. — Kerrek SB, Sep 08 '17 at 13:31
Your algorithm has `n log n` complexity, whereas it can be done in linear time using equivalent of merge algorithm. — Jarod42, Sep 08 '17 at 14:25

score 1 · Answer 1 · answered Sep 08 '17 at 13:40

Your question is quite general, but I see tho things to make it run faster:

Use hinted insertion / emplacement. When you add new element its iterator is returned. Assuming that both maps are ordered in same fashion you can tell where was the last one inserted so lookup should be faster (could use some benchmarking here).
Use emplace_hint for faster insertion

Sample code here:

std::map<int, long> data;

const std::map<int, long> update;
auto recent = data.begin();

for (auto const& updatePos : update) {
    if (updateElemNotFound) { 
        recent = data.emplace_hint(recent, updatePos);
    }
}

Also, if you want to trade CPU over memory you could use unordered_map (Is there any advantage of using map over unordered_map in case of trivial keys?), but first dot would not matter anymore.

Using emplace_hint did not provide better performance, but changing from map to unordered_map produced a little performace gain. I still have to work to improve this. — Sebastian D'Agostino, Sep 09 '17 at 18:44

score 1 · Accepted Answer · answered Sep 09 '17 at 23:27

I could find a satisfying answer thanks to researching the comments to the question. It did help a little bit to change from map to unordered_map but I still got unsatisfying results.

I ended up using Google's sparsehash that provides a better resource usage despite some drawbacks from erasing entries (which I do).

The code solution is as follows. First I include the required library:

#include <sparsehash/sparse_hash_map>

Then, my new data definition looks like:

struct eqint {
    bool operator()(int i1, int i2) const {
        return i1 == i2;
    }
};

google::sparse_hash_map<int, State, std::tr1::hash<int>, eqint> data;

Since I have to use "erase" I have to do this after the sparsemap construction:

data.clear_deleted_key();
data.set_deleted_key(-1);

Finally my loop code changes very little:

for (auto const& updatePos : update) {
    if (updatePos.first != this->toNodeId) {
        google::sparse_hash_map<int, State, std::tr1::hash<int>, eqint>::iterator msgIt = data.find(updatePos.first);
        if (msgIt != data.end() && msgIt->first) {
            msgIt->second.updateStateIfTimeStampIsHigher(updatePos.second);
        } else {
            data[updatePos.first] = updatePos.second;
        }
    }
}

The time before making the changes for a whole application run under specific parameters was:

real    0m28,592s
user    0m27,912s
sys     0m0,676s

And the time after making the changes for the whole application run under the same specific parameters is:

real    0m37,464s
user    0m37,032s
sys     0m0,428s

I run it with other cases and the results where similar (from a qualitative point of view). The system time and resourse usage (CPU and memory) decreases and the user time increases.

Overall I am satisfied with the tradeoff since I was more concerned about resource usage than execution time (the application is a simulator and it was not able to finish and get results under really heavy load and now it does).

The bad thing about this solution is that I am unable to make it work on Windows. It works on Linux. — Sebastian D'Agostino, Apr 30 '18 at 17:28

How to optimize heavy map insertion in C++ regarding CPU and memory

2 Answers2