2

Its hard to describe so I will just show the code:

#include <bits/stdc++.h> 
using namespace std; 

int main() 
{ 
    clock_t start, end; 
    unordered_map<int, int> m;
    long test=0;
    int size = 9999999;
    for (int i=0; i<size/3; i++) {
        m[i] = 1;
    }
    start = clock(); 
    for (int i=0; i<size; i++) {
        //if (m.find(i) != m.end())
            test += m[i];
    }
    end = clock(); 
    double time_taken = double(end - start) / double(CLOCKS_PER_SEC); 
    cout << "Time taken by program is : " << fixed  
         << time_taken << setprecision(5); 
    cout << " sec " << endl; 
    return 0; 
} 

The result(3 times): Without if (m.find(i) != m.end()):

Time taken by program is : 3.508257 sec 
Time taken by program is : 3.554726 sec 
Time taken by program is : 3.520102 sec 

With if (m.find(i) != m.end())

Time taken by program is : 1.734134 sec 
Time taken by program is : 1.663341 sec 
Time taken by program is : 1.736100 sec 

Can anyone explain why? What really happened inside add m[i] when the key not appeared?

Michael
  • 1,313
  • 11
  • 25
  • 1
    Just to be sure: Have you tried reversing the order of your tests and verifying that the measurements are consistent? With this type of testing, the current state of the cache along with the caching heuristics of the HW itself may well yield an impact (P.S.: I haven't gone through your actual code, under the assumption that it does what you've promised in the description). – goodvibration Jun 13 '19 at 07:57
  • 3
    This measurement of time doesn't make any sense, because you get two different maps as result, first has 9999999, the other one has only 3333333 items (with `find`). – rafix07 Jun 13 '19 at 07:59
  • 1
    @George: OP use (unordered) map which insert default element with `[]` (contrary to `std::vector`), so no out of bound accesses. – Jarod42 Jun 13 '19 at 08:02
  • @rafix07 I am on purpose to make some key not appeared in map. To see how it may affect the performance. – Michael Jun 13 '19 at 08:03
  • @George I thought it should be zero if key out of first loop... but by experience only. – Michael Jun 13 '19 at 08:05
  • *If* you use find already, would be even more efficient to reuse the result: `auto it = find(); if(it != end()) { /* use */ it->second }` – Aconcagua Jun 13 '19 at 08:09
  • `operator[]` of `unordered_map` will insert default initialized value if key does not exist. Recommended viewing: https://www.youtube.com/watch?v=lkgszkPnV8g – odyss-jii Jun 13 '19 at 08:10
  • You should try to call find, ignore the result and always do `test += m[i];`. See what timings you get. This will show if find is what makes things faster or if it's just its impact on m's content – jpo38 Jun 13 '19 at 08:11
  • 2
    Unrelated, but a must read: [Why should I not #include ?](https://stackoverflow.com/q/31816095/673852) – Ruslan Jun 13 '19 at 08:12
  • 1
    the most important question hasnt been asked yet: Did you turn on compiler optimizations? What flags did you use to compile? This information should always be included when measuring time. Measuring time with optimizations turned off is meaningless – 463035818_is_not_an_ai Jun 13 '19 at 08:14
  • @formerlyknownas_463035818 Yes, I turn off the optimization. Thanks for your comments. Reading. – Michael Jun 13 '19 at 08:16
  • 1
    1) "_is much faster than directly read_" Technically, it's **not** direct read. The `operator[]` still performs `find` under the hood. 2) "_Yes, I turn off the optimization._" Why did you turn them off, when measuring? Measuring execution time, with optimizations turned off is meaningless. – Algirdas Preidžius Jun 13 '19 at 08:17
  • measuring time with optimisation set differently to what you'd use for production release is meaningless. (And yes I have worked at a company where that was how code was released) – Tom Tanner Jun 13 '19 at 08:22
  • what do you mean with "I turn off the optimizations" ? Please include the compiler flags you used to compile in the question. Typically you have to turn **on** optimizations because they are off by default – 463035818_is_not_an_ai Jun 13 '19 at 08:23
  • @formerlyknownas_463035818 Oh I used online compiler, which has the option to choose optimazation:none. – Michael Jun 13 '19 at 08:56

2 Answers2

7

In this line

test += m[i];

the operator[] does two things: First it tries to find the entry for the given key, then if the entry does not exist it creates a new entry.

On the other hand here:

if (m.find(i) != m.end())
        test += m[i];

the operator[] does only one thing: It finds the element with the given key (and because you checked before that it exists, no new entry has to be constructed).

As the map contains only keys up to size/3 your results suggest that creating the element outweights the overhead for first checking if the element does exist.

In the first case there are size elements in the map while in the second there are only size/3 elements in the map. Note that calling operator[] can get more expensive the more elements are in the map. It is Average case: constant, worst case: linear in size. and the same holds for find. However, calling the methods many times, the worst case should amortize and you are left with average constant.

Thanks to Aconcagua, for pointing out that you did not reserve space in the map. In the first case you add many elements that require to allocate space, while in the second, the size of the map stays constant during the part you measure. Try to call reserve before the loop. Naively I would expect that the loops would be very similar in that case.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/194918/discussion-on-answer-by-formerlyknownas-463035818-why-use-find-in-unordered-ma). – Samuel Liew Jun 14 '19 at 02:18
  • Why guess at the performance when it's so easy to actually just measure: https://godbolt.org/z/Ub-Voq -- the performance is not anywhere near similar. – xaxxon Jun 14 '19 at 06:21
  • @xaxxon oh interesting, seems like I will have to fix the answer, will do that later – 463035818_is_not_an_ai Jun 14 '19 at 09:16
3

The difference with and without the if is down to you having only populated the first third of the map.

If you do a find, then the program will go and find the element, and if it exists then it will do the operator[], which finds it again (not terribly efficient), find it exists, and return the value

Without the if, when you do the operator[]. it will try and find the element, fail, and create the element (with the default value for an int, which is 0), and return it

So without the if, you are populating the whole map, which will increase the runtime.

If you wanted to be more efficient, you could use the result of the find to fetch the value

auto iter = m.find(i);
if (iter != m.end()) 
{
    test += iter->second;
}        
Tom Tanner
  • 9,244
  • 3
  • 33
  • 61