10

What is the time complexity of find method in unordered_set<int>?

And also is it possible to change the hash functions ?

HDJEMAI
  • 9,436
  • 46
  • 67
  • 93
navid mahdian
  • 143
  • 1
  • 1
  • 6
  • 7
    From the link in your question, you can find that the complexity has an average case of "constant" and a worst case of "linear in container size". What other information information are you looking for exactly? Also, you change the hash function either by changing the `Hash` template parameter of your `unordered_set`, or by specializing the `std::hash` template for your type – KABoissonneault Feb 27 '17 at 14:08
  • Look [here](https://stackoverflow.com/questions/17016175/c-unordered-map-using-a-custom-class-type-as-the-key) for hash function example in the @jogojapan answer. – HDJEMAI Feb 27 '17 at 14:29

2 Answers2

7

what is the time complexity of find method in unordered_set?

...it's right there in the page you linked:

Complexity:

Average case: constant.

Worst case: linear in container size.


and also it it possible to change the hash functions?

Yes. Again, look at the documentation!

std::unordered_map takes an Hash template parameter. It's a customization point where you can inject your own hashing logic. A custom Hash must satisfy the Hash concept.

Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
  • the page hasn't talked about the type INT specifically. – navid mahdian Feb 27 '17 at 14:10
  • 1
    @navidmahdian: why would that matter? If it doesn't specifically talks about a type, it means that it's valid for all types. – Vittorio Romeo Feb 27 '17 at 14:11
  • that is true but i think the type int might always find an element in O(1) because the hashing is much more easy – navid mahdian Feb 27 '17 at 14:14
  • 2
    The complexity of the search is independent on the complexity of the hashing of the key. While `std::hash{}( 2 )` might return the integer itself, the container still has to probe and compare to that integer key to the values contained in the container. I don't know what kind of magic you expect, but if you have research that shows O(1) for all cases (as in, the key never collides in the storage), then please share it – KABoissonneault Feb 27 '17 at 14:19
4

I guess you are getting confused by the default max_load_factor being 1. When you insert an int x in the unordered_set, it goes to the bucket i (i=x%number of buckets). So as you can imagine even if the hash function wont have collisions, as it maps each int with itself, the mod operation can have "collisions" in some cases. For example, if you insert 1, 4 and 6 in that order, both 1 and 6 will be in the same bucket, and the find function will need to go through the bucket to find them. The number of buckets is only increased when the load factor reaches the max load factor. The load factor is the arithmetic mean of the number of elements per bucket. So you can actually have more than one element in each bucket, and you can even have all elements in the same in the same bucket. In that case, finding an element that's inside the set would need a traditional sequential search (O(n)) inside the bucket. Here you have an example:

unordered_set<int> n;
n.insert(1);
n.insert(12);
n.insert(23);
n.insert(34);
n.insert(45);

In that case, every int is in the bucket 1, so when you look for 56 (56%11 = 1) you need to go through the whole bucket (size n, O(n)). The load factor is 0.4545 (5 elements / 11 buckets), so no buckets are added. You can reduce the max_load_factor (some languages use a load factor of 0.75), but that would increase the number of rehashes, as you would need to reserve buckets more frequently (the process of reserving is amortized constant, as it uses the same method std::vector uses, that's why in the example we have 11 buckets)