0

I am creating a hashtable and inserting n elements in it from an unsorted array. As I am calling the hashfunction n times. Wouldn't the time complexity to create/insert the hashtable be O(n) ?

I tried searching everywhere, but they mention complexity in case of collisions, but don't cover how can I create a hashtable in O(1) in a perfect case as I will have to traverse the array in order to pick element one by one and put it in the hashtable?

Dimitar
  • 4,402
  • 4
  • 31
  • 47
Ambidextrous
  • 882
  • 2
  • 12
  • 26
  • Do you want to make the insert time constant? is that it? – Vinicius Zaramella Jan 25 '16 at 22:52
  • @ViniciusZaramella No, I am not able to understand how can insertion be 0(1) for hashtables if I have traverse each element from my array in order to insert them one by one in hashtables. If I have 10 elements, I will run hash function 10 times, to find each element's place in hashtable. – Ambidextrous Jan 25 '16 at 22:54
  • Atleast, please right a reason, why are you downvoting the question.. – Ambidextrous Jan 25 '16 at 22:56
  • 1
    yeah but that (O(n)) is the complexity of populate a hashtable with the content of a array. The O(1) is saying about the complexity of one insertion. – Vinicius Zaramella Jan 25 '16 at 22:57
  • I did not downvoted...it was someone else. – Vinicius Zaramella Jan 25 '16 at 22:57
  • Oh, I see, but do you know why are we ignoring the complexity to populate the hashtable? – Ambidextrous Jan 25 '16 at 22:58
  • 2
    Not sure, maybe because it is more important to know that the amount of time to insert a value in the hashtable is not increasing with the size of the hashtable. – Vinicius Zaramella Jan 25 '16 at 23:03
  • 1
    It's a bit like saying "why do we measure the speed of a car in kilometres per hour, rather than metres per second" - it's just more often indicative of the distances and timeframes we typically travel. Insertions can happen when loading the table, as well as at any time thereafter. It's the more useful unit, and then more easily compared to other operations like delete or find. – Tony Delroy Jan 26 '16 at 09:56
  • It is implementation dependent, choose it according to your data set and take into account the load factor, if needed rehash. – Dimitar Jan 26 '16 at 12:37
  • You are confusing the complexity of one insertion with the complexity of *n* insertions. The former is amortized O(1); the latter is O(n) since you are doing *n* O(1) operations. – chepner Jan 26 '16 at 12:47

1 Answers1

1

When inserting a new record into a hash table, using a perfect hash function, an unused index entry (used to point to the record) will be found immediately giving O(1). Later, the record will be found immediately when searched for.

Of course, hash functions are seldom perfect. As the hash index starts to become populated the function will at times require two or more attempts to find an unused index entry to insert a new record and every later attempt to search for that record will also require two or more attempts before the correct index entry is found. So the actual search complexity of a hash table may end up as O(1.5) or more but that value is made up of searches where the record is most often found in the first attempt while others may require two or more.

I guess the trick is to find a hashing algorithm that is "good enough" which means a compromise between an index that isn't too big, an average complexity that is reasonably low and a worst case that is acceptable.

I posted on another search question here and showed how hashing could be used and a good hashing function be determined. The question required looking up a 64-bit value in a table containing 16000 records and replacing it with another value from the record. Computing the second value from the first was not possible. My algorithm sported an average search comnplexity of <1.5 with worst case of 14. The index size was a bit large for my taste but I think a more exhaustive search could have found a better one. In comparison, binary searches required, at best, about twice as many clock cycles to perform a lookup as the hash function did, probably because their time complexity was greater.

Community
  • 1
  • 1
Olof Forshell
  • 3,169
  • 22
  • 28