2

I need some help to understand how to iterate over the search results from a boost::sregex_iterator. Basically I am passing in a ';' delimited set of IP addresses from the command line, and I would like to be able to process each IP address in turn using a boost::sregex_iterator.

The code below demonstrates what I am trying to do and also shows a workaround using the workingIterRegex - however the workaround limits the richness of my regular expression. I tried modifying the nonworkingIterRegex however it only returns the last IP address to the lambda.

Does anyone know how I can loop over each IP address individually without having to resort to such a hacked and simplistic workingIterRegex.

I found the following http://www.cs.ucr.edu/~cshelton/courses/cppsem/regex2.cc to show how to call the lambda with the individual sub matches.

I also used the example in looping through sregex_iterator results to get access to the sub matches however it gave similar results.

After using the workingIterRegex the code prints out the IP addresses one per line

#include <string>
#include <boost/regex.hpp>
...
std::string errorMsg;
std::string testStr("192.168.1.1;192.168.33.1;192.168.34.1;192.168.2.1");
static const boost::regex nonworkingIterregex (
    "((\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3});?)+",
    boost::regex_constants::icase);
boost::smatch match;
if (boost::regex_match(testStr, match, nonworkingIterregex)) {            
    static const boost::regex workingIterRegex("[^;]+");
    std::for_each(boost::sregex_iterator(
        begin(iter->second), end(iter->second), workingIterRegex),
        boost::sregex_iterator(),
        [](const boost::smatch &match){ 
            std::cout << match.str() << std::endl;
        }
    );

    mNICIPAddrs = UtlStringUtils::tokenize(iter->second, ";");
    std::string errorMsg;
} else {
    errorMsg = "Malformed CLI Arg:" + iter->second;
}
if (!errorMsg.empty()) {
    throw std::invalid_argument(errorMsg);
}
...

After some experimentation I found that the following worked - but I am not sure why the c

Community
  • 1
  • 1
johnco3
  • 2,401
  • 4
  • 35
  • 67

1 Answers1

1

Try using your regex expression like this:

(\\d{1,3}\\.){3}(\\d{1,3})
l'L'l
  • 44,951
  • 10
  • 95
  • 146
  • * Note, you may have to encapsulate the expression in `(`...`)` to be like `((\\d{1,3}\\.){3}(\\d{1,3}))` – l'L'l Jan 21 '14 at 20:08
  • Thanks, both the wrapped and non wrapped expressions above actually worked, however I am not really sure why. I can see that neither expression includes the ";" delimiter that I use to separate the individual IP addresses. With sregex_iterator it is not obvious that I could not use the following regular expression "((\\d{1,3}\\.){3}(\\d{1,3});?)+" Using the above expression causes my lambda to get called 1 time with all the IP addresses stuck together with the ';' characters – johnco3 Jan 21 '14 at 20:13
  • The reason why it didn't work the way you had it was because of the iterator `;` being included in the pattern. Basically if it picks up the iterator as part of your regex pattern then it can't iterate. If you take out the `;` from your original regex you'll see that it should work also — I simplified it a bit also :) – l'L'l Jan 21 '14 at 20:16
  • Yes I discovered that just now, so is there any way to ensure that the IP addresses are correctly delimited by ';' characters or would that requre the above regex with the ';' with boost::regex_match first before digging for the sub expressions using my lambda. – johnco3 Jan 21 '14 at 20:18
  • You can test your regex with any regex tester, for example - [http://www.rubular.com/r/3emXGqEDMF](http://www.rubular.com/r/3emXGqEDMF), [http://refiddle.com/h3h](http://refiddle.com/h3h), would be good ways to quickly verify if things were as they should be. For lamba you need to modify the expression slightly by putting an additional backslash for escaped characters. – l'L'l Jan 21 '14 at 20:28
  • Thanks, I've been using a similar one http://www.regexr.com/ - The regular expressions always need a double backslash as far as I am aware. I know that there is an extension to C++ for raw string literals (I think using an 'R' string prefix) that will make the whole double backslash thing eventually go away. BTW, although the above worked - I am not sure why as another example I am using with multiple paths separated with ; delimiters does not work, not sure why. For ex if I use '/tmp/abc._dr;/tmp/def._dr' and use the regex '(\\S+\\._dr)' the lambda prints the entire string above – johnco3 Jan 21 '14 at 20:56
  • `\S` means `[ not whitespace ]`, which obviously won't work. You might want to try something such as `(\\W\\w++){2}(\\.)(_dr)` if you know you always have `/xxx/xxx._xx`;`/xxx/xxx._xx` – l'L'l Jan 21 '14 at 21:18