3

I have an std::map and an std::unordered_set with the same key.

I want to remove all keys from the set that do not exist in the map.

My idea was to do something like the following:

#include <map>
#include <unordered_set>

int main()
{
    std::map<uint64_t, std::string> myMap = { 
                                                {1, "foo"},
                                                {2, "bar"},
                                                {3, "morefoo"},
                                                {4, "morebar"} 
                                            };

    std::unordered_set<uint64_t> mySet = { 1, 2, 3, 4, 123 };

    for (const auto key : mySet)
    {
        const auto mapIterator = myMap.find(key);

        if (myMap.end() == mapIterator)
            mySet.erase(key);
    }

    return 0;
}

And then invoke it on a timer callback every few seconds, however, the snippet above throws an assert when trying to delete the key 123 from mySet, stating:

List iterator is not incrementable.

The thing is, even if it didn't throw an exception I feel like this idea is far from elegant/optimal. I'm wondering if there is a better way to approach this problem?

Thanks.

not an alien
  • 651
  • 4
  • 13
  • 4
    `for (const auto key : mySet)` -- You are mutating the set while you're looping over it. That is a bad idea (*undefined behavior*). – PaulMcKenzie Dec 06 '18 at 14:07
  • This seems pretty efficient to me, assuming one container doesn't have massively more keys than the other. – hnefatl Dec 06 '18 at 14:09
  • @hnefatl -- Efficiency means nothing if the loop causes undefined behavior. – PaulMcKenzie Dec 06 '18 at 14:10
  • @PaulMcKenzie Absolutely, but your comment already explains that well. I didn't see the need to mention it again. – hnefatl Dec 06 '18 at 14:10
  • 1
    Your approach is quite ineffective - you iterate over `std::unordered_set` and doing lookup in `std::map`, which should be opposite (lookup in hash set is faster). But with current choose of container types different approach is not easy to implement. – Slava Dec 06 '18 at 14:11
  • It is quite possible that making a copy of `std::unordered_set` then iterating over map and deleting existing keys in the copy (leaving extra keys in the copy at the end) would be more efficient, but that depends on the data. But better change data structure. – Slava Dec 06 '18 at 14:15
  • 1
    Btw +1 this question - it is so seldom that questions have proper [mcve], this itself deserves appreciation. – Slava Dec 06 '18 at 14:17
  • to be honest I can't imagine the use case for `map`/`unordered_set`? maybe you can provide a bit more details? can't you just use `unordered_map` instead of two interrelated containers? – Andriy Tylychko Dec 06 '18 at 14:28
  • Thanks for the advice all. @Slava, could you suggest a more appropriate data structure than `std::unordered_set` for my use-case? I chose it because I thought it was the best fit, but clearly I am not qualified to make such a decision. The only important consideration is that I will be adding and removing values regularly. – not an alien Dec 06 '18 at 14:32
  • 1
    it's not possible to provide an advice about proper structure w/o additional info about your requirements – Andriy Tylychko Dec 06 '18 at 14:34
  • 1
    @notanalien I only observed that for this particular situation your approach is quite ineffective, but you have to choose considering other cases and weight them. Without knowing all of that I cannot really suggest anything. – Slava Dec 06 '18 at 14:35
  • 1
    It could be much more efficient to have both containers sorted. – n. m. could be an AI Dec 06 '18 at 15:04
  • @n.m. So if I were to use `std::set` and `std::map` together would it be possible for me to improve efficiency? – not an alien Dec 06 '18 at 15:48
  • You would be able to build the new set in O(N) time. – n. m. could be an AI Dec 07 '18 at 05:42

2 Answers2

2

As stated in the comments

for (const auto key : mySet)
{
    const auto mapIterator = myMap.find(key);

    if (myMap.end() == mapIterator)
        mySet.erase(key);
}

will have undefined behavior. When you erase element 123 from mySet you invalidate the iterator that the range based for loop is using. incrementing that iterator is not allowed after you do that. What you can do is switch to a regular for loop so you can control when the iterator is incremented like

for (auto it = mySet.begin(); it != mySet.end();)
{
    if (myMap.find(*it) == myMap.end())
        it = mySet.erase(it);
    else
        ++it;
}

and now you always have a valid iterator as erase will return the next valid iterator.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
2

As stated in answers to this question How to remove from a map while iterating it? you cannot erase elements in container while iterating over it in a for range loop, so your code should use iterator. As such answer already provided I would not duplicate it.

For efficiency you have a quite wrong approach - you are doing lookup in std::map while iterating std::unordered_set, which should be opposite as std::unorederd_set provides faster lookup time (otherwise there is not point for you use it instead of std::set). One of possible approach is to create a copy:

auto copySet = mySet;
for( const auto &p : myMap )
    copySet.erase( p.first );
for( const auto v : copySet )
    mySet.erase( v );

which could be more efficient, but that depends on your data. Better approach to choose proper data types for your containers.

Note: by wrong approach I mean efficiency only for this particular situation presented in your question, but this seem to be a part of a larger program and this solution can be right for it as there could be more important cases when this data structure works well.

Slava
  • 43,454
  • 1
  • 47
  • 90
  • 1
    Comparison of both solutions: http://quick-bench.com/T-NzXOy7MM8K10jjXrxOMhTu64I. Of course, it is just a benchmark with just one set of benchmark data. – Daniel Langr Dec 06 '18 at 14:36
  • @DanielLangr for this particular data set this approach will definitely loose, I already mentioned that it depends on data. – Slava Dec 06 '18 at 14:40
  • what can be a case when map/unordered_set combination makes sense? just curious – Andriy Tylychko Dec 06 '18 at 14:46
  • @AndriyTylychko I do not know, just assumed who designed this originally knew what they doing. – Slava Dec 06 '18 at 14:49
  • @AndriyTylychko, it probably doesn't make sense. I am still a total newbie to data structures. If you could shed some light on why it's a bad combination so I can go and read some more on the topic I would be extremely grateful. – not an alien Dec 06 '18 at 15:38
  • 1
    @notanalien: we *don't know* if it's a bad combination, just a bit unusual. you haven't provided details for us to be able to judge. after some thinking I can imaging a case where it could make sense, e.g. `map` is used because you need to be able to iterate your items in order, `unordered_set` could be added for faster (than `map`) lookups, and the whole business with tidying up `unoredered_set` on timer can be related with multithreading optimisation. but it's a pure guesswork – Andriy Tylychko Dec 06 '18 at 17:32