I've been searching and testing regex's to match all uris but I can't seem to find one that matches all or most of them. Lots of the ones I've tried throw a compile error. Does anyone have an Xpressive::sRegex compatible regex?
Asked
Active
Viewed 158 times
0
-
It throws compile error since it is likely that you include the delimiters, or not escaping the string properly. – nhahtdh Jan 12 '13 at 15:59
-
I have this regex, does it look correct? "^([a-z0-9+.-]+):(?://(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*)@)?((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*)(?::(\d*))?(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?|(/?(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9A-F]{2})+(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?" – drwbns Jan 12 '13 at 16:18
-
Ah ok I found this post - http://stackoverflow.com/questions/1252992/how-to-escape-a-string-for-use-in-boost-regex – drwbns Jan 12 '13 at 16:19
-
I also have this one - is this correct? "(ftp|http|https):\/\/(\w+\.)*(\w*)\/([\w\d]+\/{0,1})+" – drwbns Jan 12 '13 at 16:24
-
I wanted to mention that the above regex doesn't match anything and I'm also using the regex_search function for matching sub-strings in a string – drwbns Jan 12 '13 at 16:35
-
Here are 8 regexp's for common things like e-mail addresses, IP addresses, ... Number 6 in the list handles url's. – Sander Jan 12 '13 at 20:00
-
I think you missed the url but is this the right way of escaping? sregex::compile("(\?:ftp\|https\?:\/\/)\+(\?:\\S+\.)\+(\?:\\S\.)\+(\?:\S)\+"); – drwbns Jan 12 '13 at 20:33
1 Answers
0
Something you can start from:
using namespace boost::xpressive;
static const sregex re = _b >> (s1 = +(~(set= ':', '/', '?', '#')))
>> as_xpr("://")
>> (s2 = *(~(set= '/', '?', '#')))
>> (s3 = *(~(set= '?', '#')))
>> !(as_xpr('?') >> (s4 = *(~(set='#'))))
>> !(as_xpr('#') >> (s5 = *_)) >> _b;