4

This question may seem a bit pedantic but i've been really trying to dive deeper into Amortized analysis and am a bit confused as to why insert for a hash table is O(1) amortized.(Note: Im not talking about table doubling, I understand that)

Using this definition, "Amortized analysis gives the average performance (over time) of each operation in the worst case." It seems like the worst case for N inserts into a hashtable would result in a collision for every operation. I believe universal hashing guarantees collision at a rate of 1/m when the load balance is kept low, but isn't it still theoretically possible to get a collision for every insert?

It seems like technically the average amortized analysis for hashtable's insert is O(1).

Edit: You can assume the hashtable uses basic chaining where the element is placed at the end of the corresponding linked list. The real meat of my question refers to amortized analysis on probabilistic algorithms.

Edit 2: I found this post on quicksort, "Also there’s a subtle but important difference between amortized running time and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small possibility that quicksort will cost (n^2) dollars, but the probability that this will happen approaches zero as n grows large." I think this probably answers my question.

ford prefect
  • 7,096
  • 11
  • 56
  • 83
K.Enfield
  • 41
  • 1
  • 3
  • 2
    There are many ways of implementing hash tables, and the choice of implementation makes a difference. For example, if the hash buckets are (unsorted) linked lists, then insert is *always* O(1), assuming you never resize the table. – psmears Sep 04 '17 at 21:13
  • Different implementations will deal with collisions in different ways. For example, a collision may go in the next available spot, or it may be stored in the same spot but via linked list, or similar but via tree. Please specify details on how collisions are dealt with so we have a fixed target to analyse. – TheGreatContini Sep 04 '17 at 21:28
  • Strictly speaking, according to the definition of "amortized" given in the question, you are absolutely correct. So if a hash algorithm handles collision by a linked list, its "amortized" complexity would be `O(n)`. Commonly though the term is used with the meaning of "average amortized" you gave. – Jardel Lucca Oct 10 '21 at 05:44

1 Answers1

2

You could theoretically get a collision every insert but that would mean that you had a poor performing hashing function that failed to space out values across the "buckets" for keys. A theoretically perfect hash function would always put a new value into a new bucket so that each key would refer to it's own bucket. (I am assuming a chained hash table and referring to the chain field as a "bucket", just how I was taught). A theoretically worst case function would stick all keys into the same bucket leading to a chain in that bucket of length N.

The idea behind the amortization is that given a reasonably good hashing function you should end up with a linear time for insert because the amount of times that insertion is > O(1) would be greatly dwarfed by the number of times that insertion is simple and O(1). That is not to say that insertion is without any calculation (the hash function still has to be calculated and in some special cases hash functions can be more calc heavy than just looking through a list).

At the end of the day this brings us to an important concept in big-O which is the idea that when calculating time complexity you need to look at the most frequently executed action. In this case that is the insertion of a value that does not collide with another hash.

ford prefect
  • 7,096
  • 11
  • 56
  • 83