I am trying to match URLs or relative paths that do not contain a second colon (after the one in the protocol, e.g., http(s)://
).
I want to reject URLs of the form
https://en.wikipedia.org/wiki/Special:BookSources/0-8018-1841-9
or paths of the form
/wiki/Special:BookSources/0-8018-1841-9
with one exception. I want to keep the ones with a second colon if it is followed by an underscore:
https://en.wikipedia.org/wiki/The_Post_Card:_From_Socrates_to_Freud_and_Beyond
or
/wiki/The_Post_Card:_From_Socrates_to_Freud_and_Beyond`
The regex I have now (based on
this question and this one) is ^[^:]*[:]*.*(/wiki/)[^:]+$
, which solves the first part of my requirement, but not the second.
How would I account for the special case of a colon followed by an underscore?