1

Say I have a std::unordered_map<std::string, int> that represents a word and the number of times that word appeared in a book, and I want to be able to sort it by the value.
The problem is, I want the sorting to be stable, so that in case two items have equal value I want the one who got inserted first to the map to be first.

It is simple to implement it by adding addition field that will keep the time it got inserted. Then, create a comperator that uses both time and the value. Using simple std::sort will give me O(Nlog(N)) time complexity.

In my case, space is not an issue whenever time can be improved. I want to take advantage of it and do a bucket sorting. Which should give me O(N) time complexity. But when using bucket sorting, there is no comperator, when iterating the items in the map the order is not preserved.

How can I both make it stable and still keep the O(N) time complexity via bucket sorting or something else?
I guess that if I had some kind of hash map that preserves the order of insertion while iterating it, it would solve my issue.

Any other solutions with the same time complexity are acceptable.

Note - I already saw this and that and due to the fact that they are both from 2009 and that my case is more specific I think, I opened this question.

A. Sarid
  • 3,916
  • 2
  • 31
  • 56
  • The example cannot be right. How do you distinguish two equal strings? Even if they were more complex objects with string keys, I do not see what the difficulty would be. – Jive Dadson Mar 10 '18 at 20:31
  • @JiveDadson If they are equal I will increment the counter. Maybe I didn't get your question. – A. Sarid Mar 10 '18 at 20:40

1 Answers1

1

Here is a possible solution I came up with using an std::unordered_map and tracking the order of inserting using a std::vector.

  1. Create a hash map with the string as key and count as value.
    In addition, create a vector with iterators to that map type.
  2. When counting elements, if the object is not yet in the map, add to both map and vector. Else, just increment the counter. The vector will preserve the order the elements got inserted to the map, and the insertion / update will still be in O(1) time complexity.
  3. Apply bucket sort by iterating over the vector (instead of the map), this ensures the order is preserved and we'll get a stable sort. O(N)
  4. Extract from the buckets to make a sorted array. O(N)

Implementation:

    unordered_map<std::string, int> map;
    std::vector<std::unordered_map<std::string,int>::iterator> order;

    // Lets assume this is my string stream
    std::vector<std::string> words = {"a","b","a" ... };

    // Insert elements to map and the corresponding iterator to order
    for (auto& word : words){
        auto it = map.emplace(word,1);
        if (!it.second){
            it.first->second++;
        }
        else {
            order.push_back(it.first);
        }
        max_count = std::max(max_count,it.first->second);
    }

    //  Bucket Sorting
    /*  We are iterating over the vector and not the map
        this ensures we are iterating by the order they got inserted */
    std::vector<std::vector<std::string>> buckets(max_count);
    for (auto o : order){
        int count = o->second;
        buckets[count-1].push_back(o->first);
    }

    std::vector<std::string> res;
    for (auto it = buckets.rbegin(); it != buckets.rend(); ++it)
        for (auto& str : *it)
            res.push_back(str);
A. Sarid
  • 3,916
  • 2
  • 31
  • 56