1

I have a weird issue that I don't know if I am miss reading the documentation or my computer is doing something strange.

I have an unordered_map. I want to iterate over the buckets of the unordered_map in bucket order. This part is important as I need the access to be relatively random. I searched over cplusplus.com and I found this. The code is as follows:

// unordered_map::bucket
#include <iostream>
#include <string>
#include <unordered_map>
int main ()
{
  std::unordered_map<std::string,std::string> mymap = {
    {"us","United States"},
    {"uk","United Kingdom"},
    {"fr","France"},
    {"de","Germany"}
  };

  for (auto& x: mymap) {
    std::cout << "Element [" << x.first << ":" << x.second << "]";
    std::cout << " is in bucket #" << mymap.bucket (x.first) << std::endl;
  }

  return 0;
}

The output I was expecting and wanted on my computer was

Element [us:United States] is in bucket #1
Element [de:Germany] is in bucket #2
Element [fr:France] is in bucket #2
Element [uk:United Kingdom] is in bucket #4

However the output I am getting is instead sorted by value type which is strange

Element [de:Germany] is in bucket #2
Element [fr:France] is in bucket #3
Element [uk:United Kingdom] is in bucket #1
Element [us:United States] is in bucket #1

I even tried replacing the value with a class that had no comparison operators and it still was able to sort them. Is this something with the way my computer is storing the map or is cplusplus.com outdated? I was able to iterate over the buckets through a loop like this:

for ( unsigned int i = 0; i < b.bucket_count(); ++i) {

    for ( hash_table::const_local_iterator image_iterator = 
    b.begin(i);image_iterator!= b.end(i); ++image_iterator ){

The only issue is that I need to be able to skip a certain number of values i.e. I only want 1 item for every 100 which requires complicated loops and is slow.

Any help would be appreciated. I can't seem to figure this out!

[EDIT] With my code my unordered_map is actually unordered_map where point is a simple class that just has two member variables and no helper functions.

When I run the loop above on my map here is my output. I have linked a text file as it is a pretty long file here What confuses me even more is that my Point class has no comparison operators. Could it be my insertion order that is causing this?

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • Note that your output is also sorted by key type and also sequential by bucket, but starting at `#2` with modulo arithmetic. – François Andrieux Nov 22 '18 at 16:02
  • As far as I know, the order of elements when iterating over an `std::unordered_map` is unspecified. I understand why you might expect it to order them by bucket but I don't know of anything that guaranties it. Evidently that's not what's happening here. – François Andrieux Nov 22 '18 at 16:04
  • How did you determine the bucket ids for the expected output? – François Andrieux Nov 22 '18 at 16:05
  • @FrançoisAndrieux I switched the keys to be out of ordered but it still outputted the same way. I am also confused on why I got a different output than the documentation I found. – Michael Honaker Nov 22 '18 at 16:08
  • 3
    It's not clear what documentation you are referring to. The order of iterating over `unordered_map` is not specified so should not be given by any documentation. – François Andrieux Nov 22 '18 at 16:10
  • Where did you find a documentation that specifies the order? On my platform the output is different from your output. – Jabberwocky Nov 22 '18 at 16:10
  • 3
    It's just a coincidence. I can reproduce with the specific keys you show, but change them to `"a"`, `"b"`, `"c"`, "`d`" and [the output is no longer sorted](https://rextester.com/OIZQ97499) (either by key or by bucket). Unordered map is unordered. – Igor Tandetnik Nov 22 '18 at 16:13
  • What platform and which compiler do you use? Maybe there is an implementation difference. – Darkproduct Nov 22 '18 at 16:13
  • @FrançoisAndrieux I am refering to multiple websites/tutorials that have them in bucket order. Here are a few example [here](http://www.cplusplus.com/reference/unordered_map/unordered_map/bucket/) , and [here](https://www.geeksforgeeks.org/traversing-a-map-or-unordered_map-in-cpp-stl/) and [here](https://thispointer.com/how-to-iterate-over-an-unordered_map-in-c11/). Even if this is not the case how can I iterate over the buckets in one for loop – Michael Honaker Nov 22 '18 at 16:13
  • @IgorTandetnik This was just an example. When I run mine on a much larger set and output the values they are in order. The only difference is that my unordered_map is with floats as the key type. – Michael Honaker Nov 22 '18 at 16:16
  • @Darkproduct g++ 7.3.0 with no extra commands so just `g++ main.cpp -o main.out` – Michael Honaker Nov 22 '18 at 16:17
  • @MichaelHonaker That's easily possible because a standard library implementation may use the identity function as hash for integers (and possibly does something similar for floats). An unordered map from `int` to something _will_ have its buckets sorted by key if the `std::hash` is just the identity function. – Max Langhof Nov 22 '18 at 16:17
  • Wouldn't it be much easier to just use something like (this)[https://stackoverflow.com/questions/158836/random-element-in-a-map] – Darkproduct Nov 22 '18 at 16:21
  • @MichaelHonaker I can't see in any of those linked sites anything that claims an output order. Those are example outputs and the results may vary. [This one](https://www.geeksforgeeks.org/traversing-a-map-or-unordered_map-in-cpp-stl/) even goes out of it's way to explicitly say *"NOTE: For unordered_map output rows can be in any order"*. – François Andrieux Nov 22 '18 at 16:25
  • @FrançoisAndrieux look at my edit to the original question. Over a large data set mine is in a specific order which confuses me. – Michael Honaker Nov 22 '18 at 16:26
  • @Darkproduct my only issue is that it needs to be a percentage based i.e. if it printing out 1% of 10000 items it must only print out 100 values. Using random there is a chance it could print out all 10000 – Michael Honaker Nov 22 '18 at 16:28
  • @MichaelHonaker I'm not sure I understand your edit completely, but you only need equality comparison and to have an `std::hash` specialization to be used as a key. It never uses `operator<` on the keys. – François Andrieux Nov 22 '18 at 16:29
  • 1
    @MichaelHonaker The bottom line is you cannot rely on `unordered_map`'s ordering of elements. The order is unspecified, but it doesn't mean random. So even a strangely well ordered sequence can be a legitimate ordering. – François Andrieux Nov 22 '18 at 16:30
  • @FrançoisAndrieux sorry for the confusion. if you look at the file I linked the output is sorted by values even though my value class doesn't have any comparison i.e. there is no way for my computer to even know which Point object is bigger than the other. What is confusing me is that it is not outputing in any order other than sorted by value. Why is my computer iterating from bucket 51534 to bucket 536493. That makes no sense – Michael Honaker Nov 22 '18 at 16:31
  • @MichaelHonaker It seems then that whatever is hashing your keys happens to produce hashes that sort in the same order as you would expect your keys to be sorted. Edit : Since your key type seems to be a user made type, look at the `std::hash` being provided and you will likely see why your key hashes are sorted. – François Andrieux Nov 22 '18 at 16:33
  • @FrançoisAndrieux I edited my comment. Clarifying question: should the iterator iterate from bucket 1 all the way to bucket 10000. Because my program jumps around buckets for no apparent reason – Michael Honaker Nov 22 '18 at 16:36
  • @MichaelHonaker There's only so many ways to say that the order is *unspecified*. Unless you find documentation that explicitly says that the buckets will be iterated over sequentially then any expectation that they are is unfounded. It's not surprising that the program jumps around buckets, it's not expected to be sequential. Unspecified implies not specified to be sequential. *In your case* it looks like the iteration is in order of key which an implementation is allowed to choose to do, – François Andrieux Nov 22 '18 at 16:38

1 Answers1

3

In std::unordered_map the bucket for a particular element is completely determined by the hash value computed on the key, std::unordered_map use as the corresponding specialization of std::hash as default hash function, which is implementation defined and is not even guaranteed to be the same throughout different program execution. see std::hash.

As @François Andrieux said in the comment section, order of iteration of std::unordered_map is not specified, hence, you cannot expect the same iteration behavior in all machines, e.g the output in my computer is:

Element [de:Germany] is in bucket #0
Element [fr:France] is in bucket #3
Element [uk:United Kingdom] is in bucket #4
Element [us:United States] is in bucket #4
Jans
  • 11,064
  • 3
  • 37
  • 45
  • Look at my edit. My confusion is on why it is ordered in that way even though there is no comparison operator – Michael Honaker Nov 22 '18 at 16:24
  • That's right, *..but you can't expect to guess what it is until you try it* ... that was what i meant, Fixed ... Thanks – Jans Nov 22 '18 at 16:29