4

I have an unordered_map storing large objects:

std::unordered_map<uint64_t, LargeObject> map;

LargeObject is an array of POD/there are no pointer members.

I am receiving a lot of LargeObjects, so I only want to insert them in the map if they don't already exist (there will be a lot of times they do already exist, as I am listening to multiple sources).

Performance matters, as I am receiving millions.

There seem to be several ways of doing this:

map[key] = largeObject;

or:

map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject);

or:

if(map.count(key) == 0)
{
    map[key] = largeObject;
}

or:

auto iter = map.find(key);
if(iter == map.end())
{
    map[key] = largeObject;
}

There maybe more I haven't included.

Which is the most efficient technique to insert when the map's value is a large object?

user997112
  • 29,025
  • 43
  • 182
  • 361
  • 1
    Try [this function](https://en.cppreference.com/w/cpp/container/unordered_map/emplace). It adds only if value not existed. Also it constructs object in place so move constructor will be used well. It is fastest way to insert. – Arty Feb 17 '21 at 17:54

2 Answers2

9

This is what try_emplace is for. The API is set up in such a way as to avoid constructing nodes, or even the value, unless it's strictly necessary. It comes form N4279.

In your case, that would be:

auto [it, success] = map.try_emplace(key, largeObject);

Each of the four options in the OP has issues:

  • map[key] = largeObject doesn't actually do what you're asking for, it would overwrite the existing item. And even if it wasn't there, it requires default construction and copy assignment.

  • The approaches with count and find both require two lookups.

  • map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject)); is a single lookup but requires constructing the large object, and the pair, unconditionally.

Not mentioned in the OP is another option: map.emplace(key, largeObject); This has the issue that it's actually under-specified whether or not the pair is created in the case that the key exists. It does on some implementations. The motivation for try_emplace was to properly specify the API so that the pair definitely does not get created if the key already exists.

Barry
  • 286,269
  • 29
  • 621
  • 977
1

If "Performance matters", and the map is large don't use std::unordered_map, since it is known to be quite slow; and you are barking up the wrong tree w.r.t. insertion.

Here's a by-now-old SO question suggesting several alternatives:

Super high performance C/C++ hash map (table, dictionary)

Of course, you will need to choose something that fits your workload best.

Additionally - if your objects are large, it is probably better to avoid a map structure templated over the actual large object - since that usually guarantees you will at least have to copy it when inserting into the map; and you don't want to have to copy large objects. Use some kind of reference type (e.g. a pointer, or some numeric index usable to get the address of the large object), and have your hasher dereference that.

einpoklum
  • 118,144
  • 57
  • 340
  • 684