0

Consider the following class

class Person
{
     public:
          explicit Person(RegistrationNumber id, std::string const& name);

          RegistrationId id() const;

          std::string const& name() const;
          Person& name(std::string const&);

          // ...
     private:
          RegistrationNumber m_id;
          std::string m_name;
          // ...
};
  1. Now I want to be able to find a person by its id, and update its name. For that, I could use std::set<Person, PersonIdCompare>, but then I cannot modify the record without a const_cast.

  2. Is it better then to store Persons in a std::map<RegistrationId, Person>, so the record can be changed, though this requires the id to be stored twice.

  3. Or maybe I should use a map, and remove the id from the Person class. But then I cannot get the id if I only have access to the Person, and not std::pair<RegistrationNumber const, Person>, so this may require to pass these pairs around.

What option is most efficient or less error prone. Think of the scenario where someone adds a setter for the person id, which would break option 1 and 2, or storing the id twice as for option number 2.

user877329
  • 6,717
  • 8
  • 46
  • 88
  • How many people are going to be in the data set? A linear traversal through a vector is pretty fast for small-ish data sets. – NathanOliver Mar 03 '20 at 17:50
  • @NathanOliver I also like to be able to force non-duplicated entries, and invariant location is also good to have. – user877329 Mar 03 '20 at 17:54
  • 3
    In that case use option 2. One extra int per object isn't much when you're already use a node based container. – NathanOliver Mar 03 '20 at 17:56

1 Answers1

0

Most of the time it's best to just throw everything in a (potentially) sorted std::vector<Person>. Then you can do binary search on the vector to look up an element. A small convenience class that encapsulates the vector object would be useful.

You have the advantage of cache locality (esp. during iteration over all entries). And at least the same retrieval performance than a set or map (O(log n)). Without optimisation however, insertion will be slightly worse (O(n) versus O(log n)). If you do bulk insertion, you can just insert all items, then sort the vector, which gets you set/map performance.

Addendum: If the ids are unique and ascending (with no or very little gaps) a vector is also the obvious choice, as you can just use the id as index. If the ids are not guaranteed to be consecutive consider something like vector<optional<Person>>.

bitmask
  • 32,434
  • 14
  • 99
  • 159
  • I should give that a try. My dataset is also invariant after load time. So load, and sort would do the trick. – user877329 Mar 03 '20 at 18:02
  • @user877329 Oh, if it is invariant and only loaded once then *definitely* go for a vector approach! – bitmask Mar 03 '20 at 18:03
  • Though there is a reason for std::set requiring const_iterator. And that is to guarantee the invariant. So if you construct a LinearSet out of an std::vector, the getters and iterators associated with that container would still require const. – user877329 Mar 03 '20 at 18:10
  • @user877329 In general yes, but if the id is invariant, your indexing will be fine. Yes, this runs the risk of someone adding a setter for the id in five years, and everything crashes. A comment on m_id saying `must be invariant` might help there. Also, you can literally make m_id const. – bitmask Mar 03 '20 at 18:15
  • Which leads into this discussion: https://stackoverflow.com/questions/4136156/const-member-and-assignment-operator-how-to-avoid-the-undefined-behavior – user877329 Mar 03 '20 at 18:44
  • @user877329 That is an old discussion. AFAIK, if you delete the assignment operator, vector will move objects around. (*unverified, though*). – bitmask Mar 03 '20 at 18:46
  • std::sort still fails.https://gcc.godbolt.org/z/Wh5ZwS. I guess I have found an impossible problem. At least with current knowledge. – user877329 Mar 03 '20 at 19:47
  • Oh right, sort needs to swap. Well then don't have your member const, but use a comment. Or ... provide your own swap function, maybe? – bitmask Mar 03 '20 at 20:12
  • 1
    If you use your own swap function make sure there is a comment explaining WHY you need your own swap function. Nothing is more infuriating than seeing a piece of code and not knowing WHY it is there if it looks like a duplicate of a standard library function... – Jerry Jeremiah Mar 03 '20 at 21:15