2

I have a C++ program that maintains a boost::flat_map. It receives real-time commands in the form of (key, value). If value is 0, then the flat_map[key] should be deleted if it exists. If value is nonzero, then the flat_map[key] should be set to value in the flat_map if the entry already exists, or it should be inserted if the entry does not already exist.

However, the commands do not come one by one. Instead, they come in batches, and the program only needs the flat_map to be sorted after each entire batch of commands is processed. It does not need the flat_map to be sorted while in the middle of processing a batch of commands.

Given this flexibility, is there a way to reduce processing time by avoiding the flat_map overhead of moving many elements on each insertion/deletion, and only incurring that overhead once at the end of each batch? The program is very latency sensitive.

Appreciate any input you may have!

galpo
  • 183
  • 1
  • 7
  • just a quick search, but it's seems like it do have `void insert(InputIterator first, InputIterator last);` https://www.boost.org/doc/libs/1_65_1/doc/html/boost/container/flat_map.html – apple apple Jun 15 '21 at 04:47
  • but on it's note *Complexity: At most N log(size()+N) (N is the distance from first to last) search time plus N\*size() insertion time* – apple apple Jun 15 '21 at 04:51
  • It says "inserts each element from the range [first,last) if and only if there is no element with key equivalent to the key of that element", but sometimes there will be an existing key equal to the new key, in which case the corresponding value should be updated, and I don't know that the insert function does that. Sorry I should have clarified that each command is simply a (key, value) pair, and the program doesn't know if it will be a new insertion or a value update before actually doing the search – galpo Jun 15 '21 at 05:36
  • isn't all insert have the same statement? – apple apple Jun 15 '21 at 05:46
  • I'm currently using this version, so I have the iterator if the key already exists: "std::pair< iterator, bool > insert(const value_type & x); Effects: Inserts x if and only if there is no element in the container with key equivalent to the key of x. Returns: The bool component of the returned pair is true if and only if the insertion takes place, and the iterator component of the pair points to the element with key equivalent to the key of x." – galpo Jun 15 '21 at 05:56

1 Answers1

3

You can use extract_sequence / adopt_sequence to update the underlying vector, and so long as it ends up ordered and uniqued, there's only a pair of vector moves in overhead.

auto underlying = my_map.extract_sequence();

// merge underlying and batch

my_map.adopt_sequence(boost::ordered_unique_range_t{}, std::move(underlying));
Caleth
  • 52,200
  • 2
  • 44
  • 75
  • isn't this discard all old values? – apple apple Jun 15 '21 at 13:42
  • @appleapple extract the existing values, update them how you like, then adopt them back – Caleth Jun 15 '21 at 13:49
  • maybe you can add some sample code, it's not trivial for me. and [`typedef implementation_defined sequence_type`](https://www.boost.org/doc/libs/1_76_0/doc/html/boost/container/flat_map.html) definitely doesn't help at all – apple apple Jun 15 '21 at 14:10
  • @appleapple it is explained further down. `boost::flat_map::sequence_type` is an alias for `std::vector, boost::new_allocator>>` – Caleth Jun 15 '21 at 16:23
  • @appleapple but the exact type isn't interesting. We really only need to know that it is `decltype(my_map)::sequence_type` – Caleth Jun 15 '21 at 16:27
  • @ Caleth, I obviously cannot merge (or add to) unknown container. – apple apple Jun 16 '21 at 07:58
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/233829/discussion-between-caleth-and-apple-apple). – Caleth Jun 16 '21 at 08:13
  • @Caleth Thanks! And what exactly would you recommend doing once the program has the extracted sequence, in order to handle all the insertions/deletions/updates in each batch of N commands? I was thinking using std::lower_bound to binary search the location of each key in the batch, effectively partitioning the extracted sequence into N+1 subsequences, and then doing N+1 std::move's with an O(1) insertion/deletion/update in between each such std::move. But not sure if this is the fastest way – galpo Jun 16 '21 at 09:34
  • @galpo If the batch is also uniqued and ordered, I'd write something with the same structure as [set_intersection](https://en.cppreference.com/w/cpp/algorithm/set_intersection#Possible_Implementation) or [inplace_merge](https://github.com/gcc-mirror/gcc/blob/d9375e490072d1aae73a93949aa158fcd2a27018/libstdc%2B%2B-v3/include/bits/stl_algo.h#L2508) – Caleth Jun 16 '21 at 09:38
  • @Caleth Would you recommend something with the same structure as set_intersection over what I described even if the size of each batch is far far smaller than the length of the extracted subsequence? I would think it would be unnecessarily slow to walk through the extracted subsequence element-by-element in that case – galpo Jun 16 '21 at 09:51
  • @galpo you could probably skip ahead with `std::lower_bound(current, end)` instead of `++current` in that case, but it remains similar – Caleth Jun 16 '21 at 09:52
  • It's really close to an `inplace_merge`, but for the removing 0s part. I can't think of a good way other than a second pass doing a remove_if – Caleth Jun 16 '21 at 09:57
  • @Caleth Ok I will try to understand its implementation, it looks nontrivial. By the way, whenever the smallest key in each batch of commands is pretty close to the largest element of the flat_map at that time (assuming the flat_map is sorted in ascending order), could using extract_sequence/adopt_sequence hurt performance, since that requires moving the underlying vector twice, while just calling insert() sequentially for each command in the batch would only require the moving of a small part of the vector? – galpo Jun 16 '21 at 10:36