2

I have this simple piece of code in c++:

int main(void)
    {
        string text = "http://www.amazon.com";
        string a,b,c,d,e,f;
        pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
        if(re.PartialMatch(text, &a,&b,&c,&d,&e,&f)) 
        {
            std::cout << "match: " << f << "\n";
            // should print "www.amazon.com"
        }else{
            std::cout << "no match. \n";
        }       
        return 0;
    }

When I run this it doesn't find a match. I pretty sure that the regex pattern is correct and my code is what's wrong. If anyone familiar with pcrecpp can take a look at this Ill be grateful.

EDIT: Thanks to Dingo, it works great.
another issue I had is that the result was at the sixth place - "f".
I edited the code above so you can copy/paste if you wish.

Community
  • 1
  • 1
shaimagz
  • 1,265
  • 4
  • 17
  • 39

2 Answers2

1

Please do cout << re.pattern() << endl; to double-check that all your double-slashing is done right (and also post the result).

Looks like

^((\w+):///?)?((\w+):?(\w+)?@)?([^/\?:]+):?(\d+)?(/?[^\?#;\|]+)?([;\|])?([^\?#]+)?\??([^#]+)?#?(\w*)

The hostname isn't going to be returned from the first capture group, why are you using parentheses around for example \w+ that you aren't wanting to capture?

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
1

The problem is that your code contains ??( which is a trigraph in C++ for [. You'll either need to disable trigraphs or do something to break them up like:

pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??" "([^#]+)?#?(\\w*)"); 
Dingo
  • 3,305
  • 18
  • 14