4

I have a pattern matching routine that looks up values from a std::map based on the URL used to request the command. The URL mapping table is filled with values like:

// Assume this->commands_ is defined elsewhere as std::map<std::string, int>
// Providing a number of URL examples to give an idea of the structure of
// the URLs
this->commands_["/session"] = 1;
this->commands_["/session/:sessionid/url"] = 2;
this->commands_["/session/:sessionid/back"] = 3;
this->commands_["/session/:sessionid/forward"] = 4;
this->commands_["/session/:sessionid/element"] = 5;
this->commands_["/session/:sessionid/element/:id/text"] = 6;
this->commands_["/session/:sessionid/element/:id/value"] = 7;

The tokens in each URL pattern (specified by the preceding ':') are replaced by actual values in the calls to the lookup routine (e.g., "/session/1234-8a0f/element/5bed-6789/text"), but are named parameters that I will need to retain. The list of named tokens in the above example is not exhaustive, and there may be other named tokens in the positions listed above. Note that the token values are hex-encoded numbers.

At present, I am iterating through the keys of the map, substituting the replacement tokens with regex values, and performing a regex match on the requested value using the std::tr1 regex classes, capturing the matched token names and values into vectors. The code is functionally equivalent to this (code is more verbose than ordinarily written for clarity):

// Assume "using namespace std;" has been declared,
// and appropriate headers #included.
int Server::LookupCommand(const string& uri,
                          vector<string>* names,
                          vector<string>* values) {
    int value = 0;

    // Iterate through the keys of the map
    map<string, int>::const_iterator it = this->commands_.begin();
    for (; it != this->commands_.end(); ++it) {
        string url_candidate = it->first;

        // Substitute template parameter names with regex match strings
        size_t param_start_pos = url_candidate.find_first_of(":");
        while (param_start_pos != string::npos) {
            size_t param_len = string::npos;
            size_t param_end_pos = url_candidate.find_first_of("/",
                                                            param_start_pos);
            if (param_end_pos != string::npos) {
                param_len = param_end_pos - param_start_pos;
            }

            // Skip the leading colon
            string param_name = url_candidate.substr(param_start_pos + 1,
                                                     param_len - 1);
            names->push_back(param_name);
            url_candidate.replace(param_start_pos,
                                  param_len,
                                  "([0-9a-fA-F-]+)");
            param_start_pos = url_candidate.find_first_of(":");
        }

        tr1::regex matcher("^" + url_candidate + "$");
        tr1::match_results<string::const_iterator> matches;
        if (tr1::regex_search(uri, matches, matcher)) {
            size_t param_count = names->size();
            for (unsigned int i = 0; i < param_count; i++) {
                // Need i + 1 to get token match; matches[0] is full string.
                string param_value = matches[i + 1].str();
                values->push_back(param_value);
            }
            found_value = it->second;
            break;
        }
    }
    return value;
}

Please note that I'm not using the Boost libraries, nor am I allowed to use them for this project.

This code feels wildly inefficient to me because I'm iterating through the keys of the map every single time, but I'm suffering from not being able to see the proverbial forest for the trees, and am having a difficult time coming up with alternatives. Though the description sounds nonsensical, what I'm essentially trying to construct is a map lookup based on regex matching of the key rather than an exact match. How can I make this more efficient? What pattern have I overlooked in my design of this function?

JimEvans
  • 27,201
  • 7
  • 83
  • 108
  • 1
    Interesting question. However, note from the outset that an `std::map` is simply not the right data structure here; it provides a mapping from an exact key to a value based on a relative ordering relation of the keys. You want something completely different, namely a mapping of keys to values based on a pattern matching relation. – Konrad Rudolph Jun 25 '11 at 16:48
  • 2
    Agreed that a map isn't the correct data structure. I'm using it here as the functional equivalent of a vector where I'd be iterating through the vector. When this function started its life, the map was the correct structure, but after requirements changed, I didn't go back and correct the code to use a different data structure. Ultimately, that's the question: What's the correct data structure to use? – JimEvans Jun 25 '11 at 16:51

2 Answers2

6

The way I see it, you could split the URL into its components (using one of the suggestions in here maybe), and then use a decision tree to find the correct pattern.

In this tree, every node would be a regular expression that matches a particular component of your URL, and the leaves would be the values you currently store in your map:

                                 session
                                    |   \
                                    |    1
                                    |
                              ([0-9a-fA-F-]+)
                              /     |     \
                             /      |      \
                           url     back    element
                            |       |       |     \
                            |       |       |      5
                            2       3       |
                                        ([0-9a-fA-F-]+)

The above is a part of the tree for your example. You'd have to use a custom data-structure to implement the tree, but that is rather simple.

Community
  • 1
  • 1
Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • As it happens, I've taken a hybrid approach with this and the other answer, but this is the insight I was missing. – JimEvans Jun 29 '11 at 12:43
1

Rather than replacing the :session_id and :id tokens in the patterns with specific values then doing the matching, how about taking the candidates and using regex replace on them to replace the specific values with the placeholders (session_id and id)? Then you could directly look for the genericized strings in the map.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • Two challenges associated with that approach. First, "sessionid" and "id" are not the only token names that can appear in those positions in the URL; the example URLs only make it seem so. Secondly, the token values are hex-encoded, so they may include alpha characters as well. I've edited the question and sample code to make that more clear. – JimEvans Jun 25 '11 at 18:22