3

I am solving a problem on InterviewBit and come across a question, here's link https://www.interviewbit.com/problems/diffk-ii/. When I have used c++ STL map to solve this problem, it shows me the message

Memory Limit Exceeded. Your submission didn't complete in the allocated memory limit. here's my code

int Solution::diffPossible(const vector<int> &A, int B) {
    int n = A.size();
    map< int , int > mp;
    for(int i =0;i<n; i++)
        mp[A[i]] = i;
    int k = B;
    for(int i =0; i<n; i++){
        if(mp.find(A[i]+k) != mp.end() && mp[A[i]+k] != i){
            return 1;
        }
        if(mp.find(A[i]-k) != mp.end() && mp[A[i]-k] != i){
            return 1;
        }
    }

    return 0;
}

and when I have replaced map by unorderd_map solution accepted. Here's code

int Solution::diffPossible(const vector<int> &A, int B) {
    int n = A.size();
    unordered_map< int , int > mp;
    for(int i =0;i<n; i++)
        mp[A[i]] = i;
    int k = B;
    for(int i =0; i<n; i++){
        if(mp.find(A[i]+k) != mp.end() && mp[A[i]+k] != i){
            return 1;
        }
        if(mp.find(A[i]-k) != mp.end() && mp[A[i]-k] != i){
            return 1;
        }
    }

    return 0;
}

It means map is taking more memory than unordered_map. Can anyone explain how this is happening? Why map is taking more memory space than unordered_map?

  • This is a difficult question to answer well, so I'm going to chicken out. In essence the *payload* of your container compares in size to the pointer on your system. Which means that a hash map will use up relatively less space. I like your ingenious hack around the problem! But note that you can solve this problem in O(N) without re-storing the input array. – Bathsheba Jun 04 '19 at 07:07
  • Possible duplicate of [Advantages of Binary Search Trees over Hash Tables](https://stackoverflow.com/questions/4128546/advantages-of-binary-search-trees-over-hash-tables) – Weak to Enuma Elish Jun 04 '19 at 07:09
  • The accepted answer in the duplicate I've linked directly answers this question. – Weak to Enuma Elish Jun 04 '19 at 07:10
  • For small memory, it is hard to beat `flat_set`, `flat_map` (available for instance in boost, apparently in the process of being standardized), i.e. sorted vectors. `map` and `unordered_map` should have similar memory requirements, I'd call it luck that one was just below the threshold in your particular implementation. – Marc Glisse Jun 04 '19 at 07:36
  • @Bathsheba How do you reach O(N)? Do you have a solution that would not involve some kind of sorting algorithm? – Oliv Jun 04 '19 at 08:59

2 Answers2

-1
  1. Maps are implemented as binary search trees and each node (additionally to useful data) typically stores 3 pointers (to a left child, a right child, and a parent).

  2. Unordered maps are implemented as hash tables, where each node is in a linked list. In case of a singly linked list (that belongs to a relevant bucket), there is only 1 pointer per node. UPDATE: However, there is also an additional pointer for each bucket. In an ideal case where there are no collisions, there would be 2 pointers per element stored in memory.

Note that 2 ints will typically occupy 8 bytes, same as a single pointer.


For example, look at the GNU libstdc++ implementation. The RB tree's node is defined as follows:

struct _Rb_tree_node_base
{
  typedef _Rb_tree_node_base* _Base_ptr;
  typedef const _Rb_tree_node_base* _Const_Base_ptr;

  _Rb_tree_color    _M_color;
  _Base_ptr     _M_parent;
  _Base_ptr     _M_left;
  _Base_ptr     _M_right;
  ...

There, you can observe those 3 pointers.


Generally, it's hard to say which container will consume less overall memory. However, I created a benchmark to insert 1M random numbers into both containers and measured maximum resident size (MaxRSS) to reflect all the consumed memory space including, e.g., heap housekeeping data. The results were as follows:

  1. 48,344 kB for std::map,
  2. 50 888 kB for std::unordered_map,
  3. 40,932 kB for std::unordered_map with reserve.

Note that the memory consumption for the unordered map (ad 2.) was higher due to reallocations of bucket lists. That is what reserve member function is for. If one cares about memory consumption and knows the number of elements beforehand, he/she should always preallocate (this is the same situation as for vectors).

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
  • Parents are optional in binary search trees, however i am unsure if the std::map implementation uses them or not. Edit: am not the downvoter – buffy Jun 04 '19 at 07:03
  • 1
    It depends on the implementation, but typically a hash table will reserve a large array (or arrays) to store the data in. I don't see how a hash table could use a linked list and keep O(1) lookup. – Weak to Enuma Elish Jun 04 '19 at 07:05
  • @WeaktoEnumaElish C++ `unordered_map` uses an array of buckets and each bucket store nodes (with key-value pairs) in a linked list. See, e.g., https://stackoverflow.com/q/31112852/580083 and https://stackoverflow.com/q/21518704/580083 for extended discussions. – Daniel Langr Jun 04 '19 at 07:11
  • Oh, that's what you meant. While each node is technically in a linked list, I'd say that the array is the truer form of hash table. – Weak to Enuma Elish Jun 04 '19 at 07:15
  • 1
    "single linked list" -> "singly linked list". Otherwise it implies there is only linked list. – Max Langhof Jun 04 '19 at 07:35
  • @buffy Without storing pointers to parents, how would you iterate over BST elements? – Daniel Langr Jun 04 '19 at 07:38
-2

A map is basically a binary search tree, while unordered_map is implemented as hash map. If you look at implementations of both, you'll quickly notice that a BST is quite a bit bigger.

It also means that map is a lot slower than unordered_map.

                | map             | unordered_map
---------------------------------------------------------
Ordering        | increasing  order   | no ordering
                | (by default)        |

Implementation  | Self balancing BST  | Hash Table
                | like Red-Black Tree |  

search time     | log(n)              | O(1) -> Average 
                |                     | O(n) -> Worst Case

Insertion time  | log(n) + Rebalance  | Same as search

Deletion time   | log(n) + Rebalance  | Same as search

BST:

struct node
{
    int data;
    node* left;
    node* right;
};

HashMap:

struct hash_node {
    int key;
    int value;
    hash_node* next;
}

Reference: https://www.geeksforgeeks.org/map-vs-unordered_map-c/

buffy
  • 536
  • 2
  • 11
  • 2
    You should be aware that the worst case happens quite often when inserting into an unordered map. The standard setting is that is uses one bucket per item + some buckets as reserve. So inserting more items than buckets left in reserve means resizing the bucket array and rehashing. – user6556709 Jun 04 '19 at 07:02
  • Btw a HashMap is usually an array and not a list. You would lose the whole performance benefit when using a list. – user6556709 Jun 04 '19 at 07:07
  • It depends on the implementation, but the typical disadvantage of hash maps vs binary trees is that a hash map will need to reserve large chunks of memory to hash items in to. The node for a binary tree is larger, but it only ever needs as many nodes as items. – Weak to Enuma Elish Jun 04 '19 at 07:09
  • 4
    @user6556709 C++ Standard mandates `unordered_map` to use _open hashing_, that is an array of buckets, where data are stored in buckets (typically by using a linked list for each bucket). – Daniel Langr Jun 04 '19 at 07:14
  • @DanielLangr I have comment the OP who wrote a hash_node as being a linked list. Btw. The items in the same bucket are also stored in an array. A linked list to expensive. – user6556709 Jun 04 '19 at 07:19
  • 2
    @user6556709 No, they are not. They are in a linked list. You can check here: https://wandbox.org/permlink/JSFJB4PM5VjvUtbp. The 2 elements are too distant to be in an array (the distance would be 8 otherwise). – Daniel Langr Jun 04 '19 at 07:29
  • @user6556709 Moreover, in C++17, you can extract nodes and insert them into a hash table without copying/moving keys/values. That would simply not be possible with arrays. How would you extract a key-value pair from an array without copying/moving? – Daniel Langr Jun 04 '19 at 07:34
  • Maybe I just don't understand the intricacies of open hashing, but I don't see how it removes the need for large arrays. It just appears to solve collisions. I went ahead and inserted one million random items into a `std::map` and `std::unordered_map` and kept track of the memory allocated using `_CrtSetAllocHook`. `map` used 24,000,000 bytes while `unordered_map` used 32,774,100 bytes. – Weak to Enuma Elish Jun 04 '19 at 07:37
  • @WeaktoEnumaElish Did you also track deallocations? When a hash table needs to resize it's bucket list, it allocates a large array, but also deallocates the smaller one. You can improve the situation by using `reserve` member function. BTW it might not be so easy to evaluate program memory constumption. Better might be to measure maximum resident size. – Daniel Langr Jun 04 '19 at 07:44
  • @DanielLangr Funnily enough, I messed up part of it and I wasn't tracking deallocations. I fixed it and ran it again, but got the same exact result. – Weak to Enuma Elish Jun 04 '19 at 07:48
  • @WeaktoEnumaElish I did the same but measured MaxRSS. For a map, it was 48344 kB, for an unordered map 50888 kB, however, for an unordered map with reserving (pre-allocation) bucket space, it was just 40932 kB. One always should preallocate if he/she cares about memory consumption and knows the number of elements beforehand (it's the same as with vectors). (Note that my numbers are higher, since your measurements do not reflect heap housekeeping data, padding, alignment, etc. You merely measure useful data.) – Daniel Langr Jun 04 '19 at 07:53
  • @DanielLangr I ran it again, but reserved 1M space beforehand. This time the results were a lot closer, with `unordered_map` only slightly larger than `map`. I wonder what causes the difference with MaxRSS. As far as I can tell, MaxRSS would also include stack memory, but that shouldn't nearly make that large of a difference. Maybe it's a difference in the implementations of the library? – Weak to Enuma Elish Jun 04 '19 at 08:02
  • @WeaktoEnumaElish The stack space is negligible here (at least in my implementation of the benchmark). To understand that difference you need to understand how heaps are implemented. However, this is a topic for whole books, you can find information all over the internet. As I wrote, each allocation at least requires some housekeeping data. Also, not each byte of the address space can be used (due to padding, alignment). Etc... – Daniel Langr Jun 04 '19 at 08:09