3

To illustrate my question, I copied below code from the "Phonebook" example of Boost help doc.

struct phonebook_entry
{
  std::string family_name;
  std::string given_name;
  std::string ssn;
  std::string phone_number;
}

And I can do a partial search as below

// search for Dorothea White's number
phonebook::iterator it=pb.find(boost::make_tuple("White","Dorothea"));

However, if I need to count the number of people who's family name is "White", then go on to find how many "White"'s have "Dorothea"' as their given name, then what's the best way to do it? I think I can do two partial queries, with pb.find(boost::make_tuple("White") and pb.find(boost::make_tuple("White","Dorothea"). But I am concerned whether this will cause a performance issue? Since the second query has no knowledge of the first query and just search the whole container. Does Boost provide something like below:

std::pair<iterator,iterator> partialResults=pb.equal_range("White");
std::pair<iterator, iterator> partialOfPartial=pb.equal_range("Dorothea", partialResults);

Or is there any smarter way to do this? Not only from convenience point of view, but for performance.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
Michael
  • 673
  • 2
  • 5
  • 23

1 Answers1

3

Because the phonebook has a composite key, it is sorted on given names within last names. So you can call the regular std::equal_range on the first search result to match a dummy phonebook_entry that has only "Dorothy" defined:

int main() 
{
    phonebook pb; // no initializer_list support for multi_index_container yet
    pb.insert({ "White", "Dorothy", "1" });  
    pb.insert({ "Black", "Dorothy", "2" });  
    pb.insert({ "White", "John",    "3" });  
    pb.insert({ "Black", "John",    "4" });  
    pb.insert({ "White", "Dorothy", "5" });  

   auto const w = pb.equal_range("White");
   auto const d = phonebook_entry{"", "Dorothy", ""};
   auto const wd = std::equal_range(w.first, w.second, d, [](phonebook_entry const& lhs, phonebook_entry const& rhs) {
       return lhs.given_name < rhs.given_name; 
   });
   std::for_each(wd.first, wd.second, [](phonebook_entry const& pbe) { 
       std::cout << pbe.phone_number << "\n"; 
   });
}

Live example that will print for "White, Dorothy" the phone numbers 1 and 5.

Community
  • 1
  • 1
TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • Thank you for your answer. I have another question though. Will std::equal_range and pb.equal_range have the performance? The above was just a sample, the actual work I am going to do would contain 100 million records. Would such a number of records impact performance of multi_index_container? – Michael Jul 20 '13 at 08:10
  • @Michael no problem, it was nice learing how to use this Boost library, I had been wanting to for quite a while :-) – TemplateRex Jul 20 '13 at 08:12
  • @Michael regarding performance: `equal_range` (both the container member and the non-member STL version) has `O(log N)` complexity on sorted input, which is pretty good. If you want faster algorithms, you could change to a hashed key (within the multi_index framework). THis would give average-case `O(1)` lookup complexity, but that might come at the cost of a bit higher memory layout (check the docs for the precise specs). The 2nd subsearch on given name could become linear if the range is not sorted(not sure, check the docs). You should profile which version is best for your application. – TemplateRex Jul 20 '13 at 08:20