0

I've been searching and testing regex's to match all uris but I can't seem to find one that matches all or most of them. Lots of the ones I've tried throw a compile error. Does anyone have an Xpressive::sRegex compatible regex?

drwbns
  • 89
  • 10
  • It throws compile error since it is likely that you include the delimiters, or not escaping the string properly. – nhahtdh Jan 12 '13 at 15:59
  • I have this regex, does it look correct? "^([a-z0-9+.-]+):(?://(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*)@)?((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*)(?::(\d*))?(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?|(/?(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9A-F]{2})+(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?" – drwbns Jan 12 '13 at 16:18
  • Ah ok I found this post - http://stackoverflow.com/questions/1252992/how-to-escape-a-string-for-use-in-boost-regex – drwbns Jan 12 '13 at 16:19
  • I also have this one - is this correct? "(ftp|http|https):\/\/(\w+\.)*(\w*)\/([\w\d]+\/{0,1})+" – drwbns Jan 12 '13 at 16:24
  • I wanted to mention that the above regex doesn't match anything and I'm also using the regex_search function for matching sub-strings in a string – drwbns Jan 12 '13 at 16:35
  • Here are 8 regexp's for common things like e-mail addresses, IP addresses, ... Number 6 in the list handles url's. – Sander Jan 12 '13 at 20:00
  • I think you missed the url but is this the right way of escaping? sregex::compile("(\?:ftp\|https\?:\/\/)\+(\?:\\S+\.)\+(\?:\\S\.)\+(\?:\S)\+"); – drwbns Jan 12 '13 at 20:33

1 Answers1

0

Something you can start from:

using namespace boost::xpressive;

static const sregex re = _b >> (s1 = +(~(set= ':', '/', '?', '#')))
                            >> as_xpr("://")
                            >> (s2 = *(~(set= '/', '?', '#')))
                            >> (s3 = *(~(set= '?', '#')))
                            >> !(as_xpr('?') >> (s4 = *(~(set='#'))))
                            >> !(as_xpr('#') >> (s5 = *_)) >> _b;