8

I'm using boost's regex library, and I'm finding that determining if a named match is found and then using that information is a little annoying. To detect a named match, I'd like to do this:

typedef boost::match_result<string::const_iterator> matches_t;
typedef matches_t::const_reference match_t;
boost::regex re("(?:(?<type1>aaaa)|(?<type2>bbbb)" /*...*/ "|(?<typeN>abcdefg)");
string str(SOME_STRING);
matches_t what;
boost::match_flag_type flags = boost::match_default;

if(regex_search(str.cbegin(), str.cend(), what, re, flags))
{
  if((match_t type1 = what["type1"]).matched)
  {
     // do stuff with type1
  }
  else if((match_t type2 = what["type2"]).matched)
  {
     // do stuff with type2
  }
  // ...
  else if((match_t typeN = what["typeN"]).matched)
  {
     // do stuff with typeN
  }
}

If that'd would work, that would be great. Scoping would be constrained to the if's body, memory can be used efficiently and it looks fairly clean. Sadly, it doesn't work as you cannot define a variable within a list. :(

This could have been a possibility:

if(regex_search(str.cbegin(), str.cend(), what, re, flags))
{
  match_t found = what["type1"];
  if(found.matched)
  {
     // do stuff with type1
  }
  else if((found = what["type2"]).matched)
  {
     // do stuff with type2
  }
  // ...
  else if((found = what["typeN"]).matched)
  {
     // do stuff with typeN
  }
}

But match_t is a const reference so it's not assignable. (tl;dr Also I don't know what the underlying type is and generally I don't really want to know as I'd prefer a more generic solution that I could use outside of this one example of regex. Even std::move() was used around what[...] it gets even more verbose and the documentation doesn't say that it uses move semantics for sub_match. All of this is moot of course due to the reason given in the first sentence of this paragraph.)

Another option is to do is this:

if(regex_search(str.cbegin(), str.cend(), what, re, flags))
{
  match_t type1 = what["type1"];
  if(type1.matched)
  {
     // do stuff with type1
  }
  else {
    match_t type2 = what["type2"];
    if(type2.matched)
    {
       // do stuff with type2
    }
    // ...
          else {
            match_t typeN = what["typeN"];
            if((match_t typeN = what["typeN"]).matched)
            {
               // do stuff with typeN
            }
          }
    // ...
    }
  }
}

Which I don't like due to deep nesting of braces.

Perhaps abusing a loop structure with breaks at the end of each if body like this:

if(regex_search(str.cbegin(), str.cend(), what, re, flags))
{
  do{
    {
      match_t type1 = what["type1"];
      if(type1.matched)
      {
         // do stuff with type1
         break;
      }
    }
    {
      match_t type2 = what["type2"];
      if(type2.matched)
      {
         // do stuff with type2
         break;
      }
    }
    // ...
    {
      match_t typeN = what["typeN"];
      if(typeN.matched)
      {
         // do stuff with typeN
         break;
      }
    }
  } while(0);
}

Which is better, but still not great. Using macros, much of the noise could be hidden from view. Like:

#define IF(declare, cond) do{{declare;if(cond){                
#define ELSE_IF(declare, cond) break;}}{declare; if(cond){     
#define ELSE break;}}{{                                        
#define END_IF break;}}}while(0);                              

if(regex_search(str.cbegin(), str.cend(), what, re, flags))
{
  IF(match_t type1 = what["type1"], type1.matched)
  {
     // do stuff with type1
  }
  ELSE_IF(match_t type2 = what["type2"], type2.matched)
  {
     // do stuff with type2
  }
    // ...
  ELSE_IF(match_t typeN = what["typeN"], typeN.matched)
  {
     // do stuff with typeN
  }
  END_IF
}

The braces are actually implied by the macros, but it makes for clearer reading by restating them.

One other option I can think of is to go into the boost::sub_match class and add a conversion function to convert that type to a bool who's return value would be that of the matched member. Then I could declare a match_t variable in the if expression and it would automagically be converted to a boolean value by the if. I'm not sure if I'm there yet and it's not generic.

Stylistically, are the ones I propose good or bad (only the last 3 actually work, so I'd probably confine comments to them).

Also, does anyone have any better suggestions? Please state why you think they are better.

Adrian
  • 10,246
  • 4
  • 44
  • 110
  • Note that `if(foo = bar)` is not the same as `if(foo == bar)` in C++, the code probably does not do what you want. Also I strongly recommend against using macros to "remove junk" as in your last example. It does not remove "junk" but makes the code incomprehensible. – Damon May 24 '13 at 07:39
  • @Damon: Yes, I do know that = is not the same as ==. The code does exactly what I want, to limit scope, reduce noise and reuse stack space (if the compiler optimizes it out). I sort of agree with you re: the macros, but it does reduce excess noise so long as you know what the macros are for, which can make things clearer. But YMMV. So I take it that you prefer the 2nd one with the `do { } while(0);` sans the macros? – Adrian May 24 '13 at 08:41
  • Not sure, I'd probably write the very first version (declaring the variables in the outer `if` scope, so it compiles), since that's the least obfuscated version, in my opinion. But at the end of the day, it comes down to personal taste. And, I would put double parentheses around the assignments inside `if` to make explicit that these are not accidential (with proper warnings enabled, this is what e.g. GCC suggests to do, too). – Damon May 24 '13 at 08:55
  • @Damon: The very 1st? Or the 3rd out of the 5? I ask this since the very 1st doesn't work, and it's not possible to pre-declare the variables without assigning them a value as the type is a const reference. – Adrian May 24 '13 at 14:31
  • What is the hindrance to declaring (and assigning, since they're references) the three as the first thing in the block enclosed by `if(regex_search(...))`? This should work fine, should it not? In the worst case, it initializes 2 variables in vain, but on the average it will be just good, and it's readable code :) – Damon May 24 '13 at 15:37
  • This is a list of N types like if I had a switch/case structure with N cases. Although I state a concrete case, I would like a generic solution. Having N variables declared after the `if(regex_search(...))` is wasteful as it declares N-1 variables in vain and moves the variables away from the location where they are used. – Adrian May 24 '13 at 17:31
  • Then why not consider _not_ using const references, so code snippet 2, which is very clean and nice will compile. This will copy a few matches to the stack, but so what. Unless there's 10,000 matches you couldn't care less. If you have, say, 5-10 matches, then worrying about the overhead of copying them is the typical case for quoting Tony Hoare. Also, chances are the compiler is smart enough to elide the copy anyway. – Damon May 25 '13 at 11:36

3 Answers3

6

It is generally recommended to avoid nested ifs - they make code harder to read. If there's nested if, it should probably replaced by a function call.

In your case, you need to use a loop.

your second example:

if(regex_search(str.cbegin(), str.cend(), what, re, flags))
{
  match_t found = what["type1"];
  if(found.matched)
  {
     // do stuff with type1
  }
  else if((found = what["type2"]).matched)
  {
     // do stuff with type2
  }
  // ...
  else if((found = what["typeN"]).matched)
  {
     // do stuff with typeN
  }
}

BEGS for a loop:

const char *types[] = {"type1", "type2", "typeN", 0};
for(const char **cur = types; *cur; cur++){
    found = what[*cur];
    if (found.matched){
         //initiate robot uprising
         break;
    }
}

All your other examples (IMO) are a bad coding style. I prefer to keep loops and ifs short. If it doesn't fit into 20 lines of code, then it better be doing something very complicated (which is not your case). If it doesn't do anything complicated, it needs to be restructured.

SigTerm
  • 26,089
  • 6
  • 66
  • 115
  • I definitely like this as it easily scales, is clean and easy to use a function pointer jump table with it. I think we have a winner! :) Though IMO a standard iterator style should have been used instead of using a NULL at the end of the array. – Adrian May 24 '13 at 16:24
  • Oh, and your syntax is way off. – Adrian May 24 '13 at 16:41
  • Adrian: "syntax is way off"? You mean "{ at the end of line" thing or something else? – SigTerm May 24 '13 at 19:36
  • 1
    You know what, it might not be. There are quite a few syntactic changes in C++11. I'm just upgrading to C++11 and from what I know from C++, the pointer initialisation you used is invalid. Is this new to C++11? I'm more used to `const char *types[] = {"type1", "type2", "typeN", 0};`. – Adrian May 24 '13 at 20:03
  • @Adrian: Whoops. That was a typo (I posted the code without compiling). In fact, two of them. Good job noticing the problem. I use C++03 standard at the moment, by the way. – SigTerm May 24 '13 at 20:29
  • Oh. Good to know that I'm not going insane or am too far behind the curve. ;) Even though I like this, I think I still like my macro version better for small N. I'm still on the fence with it. – Adrian May 24 '13 at 20:54
  • @Adrian: Macros make sense when you're doing something very repetitive/tedious - for example, if you have to manually unroll the loop, etc. Or if you're doing things that can't be done with templates (stringize operator) Your last example hides flow control statement but pretends to be IF, which (IMO) isn't a very good thing. Macros like this look like black magic and might seriously confuse maintainer. If I were using a macro in the loop, I would use multiline macro(readability), I would define it within the loop, I wouldn't hide the loop, and I would undef all macros before the end of loop. – SigTerm May 24 '13 at 21:08
  • If statements *are* flow control statements. The only difference with my macro `IF` and the language `if` is that the macro one allows for declaration of a variable object and then performing an operation on that object for a decision. The one thing about that macro `IF` is that it's not standard so it may make someone scratch their head for a bit if it's not documented. Once understood, it is pretty obvious what is happening, and even if it's not documented (which I don't recommend) it's still pretty obvious what the intention is. – Adrian May 24 '13 at 21:28
  • Your point about the hidden loop is interesting though. It would have to be documented that you couldn't use a `break` or `continue` statement from within this control structure as it has hijacked the loop structure to achieve its goals and will result in unexpected behaviour. That or have the loop exposed like you suggested, but that would add to the noise... Meh *shrug* – Adrian May 24 '13 at 21:42
3

You could do something like this (note this code is not tested against a compiler)

// create a map between match types and actions
std::map<std::string, std::function<match_t>> actions;
actions["type1"] = [&](match_t type) {...};

// iterate through the actions map and check each match type
for(auto kvp : actions)
{
   match_t type = what[kvp.first];
   if(type.matched)
   {
      // do stuff with this result
      kvp.second(type);
   }
}
Marius Bancila
  • 16,053
  • 9
  • 49
  • 91
  • Hmmm, interesting. You forgot the break at the end of your if body though. Other potential issues are 1. you can't define which is to be tested first (though this could be done using a vector of a tuple). And 2. moves the code away from execution point. But using a lambda like this is definitely interesting. Could also be done with a full fledged function. +1 for ingenuity. :) – Adrian May 24 '13 at 05:41
  • What are people's opinions of the readability of this? An action should be fairly short. No more than 10 lines. – Adrian May 24 '13 at 05:47
  • 1
    You can replace the map with a list or other sequence container so you can control the order of the matches evaluation. – Marius Bancila May 24 '13 at 07:28
  • Yeah, I did mention that in my first comment. Though I stated it as a vector instead of a generic sequence container. – Adrian May 24 '13 at 08:27
0

You can write a wrap of match_t with an overloads of operator bool:

struct bmatch
{
    matches_t::const_reference ref;
    bmatch(matches_t::const_reference r)
    :ref(r)
    {}

    operator bool() const
    {
        return ref.matched;
    }
};

And then:

if (bmatch type1 = what["type1"])
{ //Use type2.ref
}
else if (bmatch type2 = what["type2"])
{ //Use type2.ref
}

For extra points you can also overload operator->:

    matches_t::const_reference operator->() const
    {
        return ref;
    }
rodrigo
  • 94,151
  • 12
  • 143
  • 190