-2

I see little use of std::unordered_set or any hash table for that matter if the objects themselves are the keys: in order to search anything, you'd need to have the element in the first place. But if I already have the element, I don't need to search it (the exception being if I only need to check if it's a duplicate).

Much more interesting is the so called "heterogeneous lookup", the case where your data is an object containing multiple pieces of data, and treating one of those as key. For example I want to get all the employee data of one employee named "Pete Johnson" by doing auto it = employees.find("Pete Johnson").

I read about transparent comparators and tried creating my own hash function and 'equal_to' operator to use a subset of the data. Creating the set and adding objects works, but I still cannot get the 'find' function to work the way I want.

The easy solution for what I want is to just use std::unordered_map, but I hoped there would be a better way than to keep two copies of each key. Another workaround is to create a dummy object, copy the key into the dummy, then use that to do the lookup. However, I think the ideal solution would be if I had the option to provide a 'getKey' function to the constructor, so that 'find' can accept a naked key and use 'getKey' to compare it to the object to see if it matches.

std::string getKey(const userData_t& object){
  return object.name;
}

std::size_t userData_t_hash(const userData_t& object){
  return std::hash<std::string>()(getKey(object));
}

bool userData_t_equals_to(const userData_t& object, const std::string& str){
  return getKey(object) == str;
}

bool userData_t_equals_to(const userData_t& object, const userData_t& other){
  return getKey(object) == getKey(other);
}
Qwert Yuiop
  • 105
  • 1
  • 11
  • https://stackoverflow.com/questions/20317413/what-are-transparent-comparators transparent comparators make things more interesting – Sergey Kolesnik Dec 18 '22 at 21:52
  • 3
    You're basically asking for an explanation of what scenarios in the entire discipline of _set theory_ do not require the set to be ordered. – paddy Dec 18 '22 at 22:00
  • also a "hash-set" can be used as an intermediate data structure for some algorithms when you have pod types that already contain some property that you want to be used as a key. So you make an `std::unordered_set` of references instead on duplicating that properties into `unordered_map`'s keys – Sergey Kolesnik Dec 18 '22 at 22:00
  • 3
    Effectively, `std::unordered_set` is roughly equivalent to `std::unordered_hash` for what it can be used for. Simple scenario. Imagine I want to parse a service log of IP addresses (strings) and filter out all the duplicate items. An implementation would be to construct an instance of `unordered_set` and insert each IP address string into the set. It will automatically filter out duplicates for me. Then I can print out the unique set of IP addresses by enumerating over the collection. – selbie Dec 18 '22 at 22:04
  • Sometimes you have a map where the key itself is the data, and (much like a `bool`) the important thing is whether or not the container contains the key. And it needs to be more performant than `std::set`. Answer: `std::unordered_set`. – Eljay Dec 18 '22 at 22:25
  • @paddy, it would be helpful if you'd be able to provide one such scenario, thanks. – Qwert Yuiop Dec 19 '22 at 10:13
  • You shouldn't make big changes to a question after getting an answer. And there's little point in doing so after the question is closed. Perhaps ask a new question? – HolyBlackCat Dec 21 '22 at 13:13
  • @HolyBlackCat, The post was not closed with my permission. It said I needed to update because it wasn't good enough, so I did. – Qwert Yuiop Dec 21 '22 at 13:23
  • Mhm, but posting a new question improves your chances of getting answers. – HolyBlackCat Dec 21 '22 at 13:28
  • @HolyBlackCat, I learned today that I'm blocked from posting questions. Apparently my previous questions were not good enough, though I'm unsure what I'm doing wrong. – Qwert Yuiop Dec 21 '22 at 16:24
  • 1
    [Read this](https://meta.stackexchange.com/q/86997/353058). – HolyBlackCat Dec 21 '22 at 16:54

1 Answers1

1

Why would I need to search and obtain (a reference to) the object from the hash table if I already had the object?

One reason is to find out if the object is in the hash table. Sometimes sets are used to eliminate duplicates from a collection. If sorting is not needed, then unordered_set might be the best choice for the task.

It would make more sense if I could provide a 'getKey' function [etc.]

You can in principle. The syntax is not exactly what you had in mind, but the functionality is available.

The hash of an object does not need to take into account all data in the object. The requirement is that if two objects should be considered the same, then they must have the same hash. If you define the hash of your object to be the hash of object.getKey() and define equality to be having the same key, then you could find a real object by constructing a dummy object with just the key set. If your objects represent employees and the key is the name (not the best choice, but sufficient for now), then you could construct an employee object with no data set other than the name, and use that to find the real employee object.

Admittedly, constructing a dummy object to set one field is a bit of a hassle. Hence, C++14 introduced find overloads that accept data that compares equivalent to elements of your set. This requires that the hash and compare types have a member type named is_transparent. For example, this member type exists in std::equal_to<void> (which is not the same as the default comparison, std::equal_to<Key>). With the appropriate hash and comparison definitions, you could use employees.find("Pete") to find the employee object whose name is "Pete".

(I'm going to leave further details as "out of scope" for the question that was asked. Some general information can be found in What are transparent comparators?.)

So one use case for unordered_set is a replacement for unordered_map when the values would necessarily contain the key and the values do not need to change.

JaMiT
  • 14,422
  • 4
  • 15
  • 31