3

Suppose I am using std::unordered_map<std::string, Foo> in my code. It's nice and convenient, but unfortunately every time I want to do a lookup (find()) in this map I have to come up with an instance of std::string.

For instance, let's say I'm tokenizing some other string and want to call find() on every token. This forces me to construct an std::string around every token before looking it up, which requires an allocator (std::allocator, which amounts to a CRT malloc()). This can easily be slower than the actual lookup itself. It also contends with other threads since heap management requires some form of synchronization.

A few years ago I found the Boost.intrusive library; it was just a beta version back then. The interesting thing was it had a container called boost::intrusive::iunordered_set which allowed code to perform lookups with any user-supplied type.

I'll explain it how I'd like it to work:

struct immutable_string
{
    const char *pf, *pl;
    struct equals
    {
        bool operator()(const string& left, immutable_string& right) const
        {
            if (left.length() != right.pl - right.pf)
                return false;

            return std::equals(right.pf, right.pl, left.begin());
        }
    };

    struct hasher
    {
        size_t operator()(const immutable_string& s) const
        {
            return boost::hash_range(s.pf, s.pl);
        }
    };

};

struct string_hasher
{
    size_t operator()(const std::string& s) const
    {
        return boost::hash_range(s.begin(), s.end());
    }
};

std::unordered_map<std::string, Foo, string_hasher> m;
m["abc"] = Foo(123); 

immutable_string token; // token refers to a substring inside some other string

auto it = m.find(token, immutable_string::equals(), immutable_string::hasher());

Another thing would be to speed up the "find and insert if not found" use case—the trick with lower_bound() only works for ordered containers. The intrusive container has methods called insert_check() and insert_commit(), but that's for a separate topic I guess.

seh
  • 14,999
  • 2
  • 48
  • 58
yonil
  • 564
  • 3
  • 12

2 Answers2

2

Turns out boost::unordered_map (as of 1.42) has a find overload that takes CompatibleKey, CompatibleHash, CompatiblePredicate types, so it can do exactly what I asked for here.

yonil
  • 564
  • 3
  • 12
1

When it comes to lexing, I personally use two simple tricks:

  1. I use StringRef (similar to LLVM's) which just wraps a char const* and a size_t and provides string-like operations (only const operations, obviously)
  2. I pool the encountered strings using a bump allocator (using lumps of say 4K)

The two combined is quite efficient, though one need understand that all StringRef that point into the pool are obviously invalidated as soon as the pool is destroyed.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 2
    As of Boost 1.53, you can use `#include ` – Marshall Clow Feb 23 '13 at 15:36
  • Very good, thanks. I can't upgrade to Boost 1.53 for my present work. Anyway I'm making use of `unordered_map`. Essentially it's the only realistic option that doesn't involve tinkering with the interface for containers. my `immutable_string` really amounts to the same thing as `StringRef`. – yonil Feb 23 '13 at 17:17