Wrong Complexity Calculation for Hash Tables?

Question

I was reading: https://www.geeksforgeeks.org/given-an-array-a-and-a-number-x-check-for-pair-in-a-with-sum-as-x/

I think there is a mistake with calculating the time complexity for Method2 (Hashing) where they claimed it's O(n) and I insist it's O(n) Amortized.

Algorithm:

1 Initialize an empty hash table s.

2 Do following for each element A[i] in A[]

2.1 If s[x – A[i]] is set then print the pair (A[i], x – A[i])

2.2 Insert A[i] into s.

Step 1 is done in O(1). Step 2 does O(n) iterations, Where for each one we are doing O(1) Amortized (2.1 & 2.2) so in total we have O(n) Amortized.

Ok, they claim O(n) and you claim O(n) amortized. But why is that difference important? — user3386109, Oct 26 '21 at 06:21
it's two different things... Only one can be true. @user3386109 — Daniel, Oct 26 '21 at 06:24
Does [this](https://stackoverflow.com/a/200414/16757174) answer your question? Specifically, the definition of O(1) amortized cost as 'There is a constant c, such that for every sequence of operations (also one ending with a costly operation) of length L, the time is not greater than c*L'? When you total up all amortized costs, the result becomes a true upper bound on the total cost. — kcsquared, Oct 26 '21 at 06:33
Does it matter? The number of iterations is a very poor metric for algorithm performance on modern computers. The number of comparisons is relevant though. — Lundin, Oct 26 '21 at 07:47
@user4581301 did you read the post? how can both be wrong, one is correct for sure — Daniel, Oct 26 '21 at 09:02
Do not tag both C and C++ except when asking questions about differences or interactions between the languages. — Eric Postpischil, Oct 26 '21 at 10:53
@user4581301: “Only one can be true” in common English usage means at most one can be true, not that exactly one is true. So it admits that case that both are false. — Eric Postpischil, Oct 26 '21 at 10:59
@nielsen: The C++ standard says that `insert` for `unordered_set` is “Average case O(1)”, by which it means O(1) amortized). Hash insertion cannot be O(1) because collisions increase time and grow with the number of items in each bucket. But they can be O(1) amortized by increasing the table size as it fills and rehashing to move all entries into the new larger table. — Eric Postpischil, Oct 26 '21 at 11:02
@Daniel: “it's two different things... Only one can be true.”: O(n) implies O(n) amortized. If each instance of some operation X is takes at most cn steps (is O(n)), then the average of n instances of operation X takes at most cn steps (so X is O(n) amortized). They are different things, but the set of O(n) operations is a subset of the set of O(n) amortized operations. — Eric Postpischil, Oct 26 '21 at 11:12
@EricPostpischil Right. I missed the C++ tag. In the linked text, the C implementation uses a boolean table with the identity function as the hash function so that is O(1). — nielsen, Oct 26 '21 at 11:25

Eric Postpischil · Answer 1 · 2021-10-26T11:24:05.047

When an O(1) amortized step is performed n times, it is not valid to conclude the total cost is just O(n) amortized. The fact that a step is O(1) amortized means its average cost for n times is at most some constant c, and the fact that its average cost is at most c implies the total cost for those n steps is at most cn, so the cost for n steps is O(n), not just O(n) amortized.

By the definition of amortized cost with the aggregate method, the fact that a operation is T(n)/n amortized means there is some upper bound T(n) on performing n operation. So, if an operation is O(1) amortized, meaning there is some c such that the average cost is at most c, we have T(n)/n ≤ c, and therefore T(n) ≤ cn, and therefore performing n operation has at most cn cost. Therefore, the cost of n operations is O(n), not just O(n) amortized.

There can be some confusion in considering operations in isolation rather than as part of a sequence of n operations. If some program executes billions of unordered_set insertions and we take a random sample of n of them, it is not guaranteed that those n have an O(1) amortized time. We could have been unlikely to get many of the insertions that happened to be rebuilding the table. In such a random selection, the statistical average time would be O(1), but each sample could fluctuate. In contrast, when we look at all the insertions to insert n elements in the table, their times are correlated; the nature of the algorithm is such that it guarantees the table rebuilds occur only with a certain frequency, and this guarantees the total amount of work to be done over n insertions is O(n).

`If some program executes billions of unordered_set insertions and we take a random sample of n of them, it is not guaranteed that those n have an O(1) amortized time` - this is true, but seems to be more of a limitation of the aggregate method than fundamental to amortization, right? For example, using a potential method, you can get every insertion to have exactly the same constant upper bound amortized cost (but not necessarily the same average actual cost) — kcsquared, Oct 26 '21 at 12:53

Wrong Complexity Calculation for Hash Tables?

1 Answers1