Suppose the text to search is pqr
.
"http://abc.zzz/pqr/xyz" -> Should not match
"/pqr/" -> Should Match
"pqr" -> Should Match
"http://abc.zzz/pqr/pqr/" -> Should not match
"http://abc.zzz/pqr/pqr/ pqr" -> Should match the last "pqr"
"www.pqr.zzz" -> Should not match
I tried using the following regex,
((?:(?:(?:https?|ftp|file|mailto):)|www)[^ ]+?)?(pqr)
I then looked for group 1
, if it is empty then I was considering it as a match. But this fails for http://abc.zzz/pqr/pqr/
Any help here in detecting if the text to match is not part of a url?
The worst case I think is to detect all the urls first and then store the start and end indexes of the matched urls. Then try to match pqr
and exclude all those which are part of the url. I was thinking if there is something that can be done better.