0

Announcement of the exercise: https://cses.fi/problemset/task/1640/

I would like to understand why one algorithm is much faster than another (about 25 times) when I think they have the same complexity O(n log n) but the fastest one should in principle loop 1 more time on my list so I would have really thought it slower.

Slow:

#include <bits/stdc++.h>

using namespace std;

 
int main() {
    int n, x; cin >> n >> x;
    vector<int> a(n);
    map<int, int> vals;
    for (int i = 0; i < n; i++) {
        cin >> a[i];
    }
    for (int i = 0; i < n; i++) {
        if(vals.count(x - a[i])){
            cout << i + 1 << " " << vals[x - a[i]] << "\n";
            return 0;
        }
        vals[a[i]] = i + 1;
    }
    cout << "IMPOSSIBLE" << '\n';
}

fast:

#include <bits/stdc++.h>
 
using namespace std;
 
int main() {
    ios::sync_with_stdio(false);
    cin.tie(0);
    int n, x;
    cin >> n >> x;
    pair<int, int> arr[n];
    for (int i = 0; i < n; i++) {
        cin >>arr[i].first;
        arr[i].second = i + 1;
    }
    sort(arr, arr + n);
    int i = 0, j = n - 1;
    while (i < j) {
        int sum = arr[i].first + arr[j].first;
        if (sum == x) {
            cout << arr[i].second << ' ' << arr[j].second;
            return 0;
        } else if (sum < x) {
            i++;
        } else {
            j--;
        }
    }
    cout << "IMPOSSIBLE";
    return 0;
}
Botje
  • 26,269
  • 3
  • 31
  • 41
antho
  • 534
  • 4
  • 18
  • 1
    Why does only the second one have `ios::sync_with_stdio(false); cin.tie(0);`? Have you considered that the difference can be due to this? – interjay Nov 02 '20 at 09:53
  • 3
    please provide a [mre] with input and outputs and what the code is supposed to do – Alan Birtles Nov 02 '20 at 10:05
  • I did the test and the difference is much smaller but the second one is still twice as fast. I tried with an unordered_map and I have a time limit on one test. I really don't understand – antho Nov 02 '20 at 10:07
  • I put a link where everything is written and the code is testable to know the performance, isn't it a good practice ? – antho Nov 02 '20 at 10:08
  • In the first code, you are performing a sort iteratively, which is slightly sub-optimal with respect to the second code, where sort is performed in one shot. The second part is O(n) only. – Damien Nov 02 '20 at 10:09
  • I am afraid that you are not benchmarking the algorithm, but the time for the cout statements. –  Nov 02 '20 at 10:29
  • Yes, an error of inattention, I was fixed on the algorithm. But it remains slower and the explanation below allowed me to really understand the difference between the two. – antho Nov 02 '20 at 10:31
  • No links are not sufficient, links die. The question needs to be self contained, you can provide a link for additional context but the question must make sense without it – Alan Birtles Nov 02 '20 at 10:54

1 Answers1

2

Note that big-O complexity purposefully ignores constant factors as well as architecture-specific details like branch prediction and caching. The work for the first is N + N * (log_2(N) + log_2(N)) (constructing the vector + two map accesses per element), while the second is N + N * (log_2(N) + 1) (constructing the vector, sorting it, then walking it once).

If you then look at the low-level access patterns, you will see that (assuming a binary search tree) every map access needs at worst log_2(2*10^5)=18 memory accesses, some of which will be cache misses, so they will have to wait for (comparatively slow) main memory.

Compare this to the second, where you fetch data linearly from left to right (the best case for a cache) and right to left (about the same). Assuming 4-byte ints and 64-byte cache lines, that means you only need two main memory accesses per every 16 iterations of the loop, and the other memory accesses can be satisfied from the L1 cache.

Botje
  • 26,269
  • 3
  • 31
  • 41