2

I'm stuck with an easy regex to match URLs in a content. The goal is to remove the folder from the links like "/folder/id/123" and to replace them with "id/123" so it's a short relative one in the same folder.

Actually I did

$pattern = "/\/?(\w+)\/id\/(\d)/i"
$replacement = "id/$2";
return preg_replace($pattern, $replacement, $text);

and it seems to work fine.

However, the last test that I'd like to to is to test that each url matched does NOT containt http://, if it's an external site which also use the same pattern /folder/id/123.

I tried /[^http://] or (?<!html)... and different things without success. Any help would be verys nice :-)

    $pattern = "/(?<!http)\b\/?(\w+)\/id\/(\d)/i"; ???????

Thanks !

Here is some examples : Thanks you VERY MUCH for your help :-)

(these should be replaced, "same folder" => short relative path only)
<a href="/mysite_admin/id/414">label</a> ==> <a href="id/414">label</a>
<a href="/mYsITe_ADMIN/iD/29">label with UPPERCASE</a> ==> <a href="id/414">label with UPPERCASE</a>

(these should not be replaced, when there is http:// => external site, nothing to to)
<a href="http://mysite_admin/id/414">label</a> ==> <a href="http://mysite_admin/id/414">label</a>
<a href="http://www.google_admin.com">label</a> ==> <a href="http://www.google_admin.com">label</a>
<a href="http://anotherwebsite.com/id/32131">label</a> ==> <a href="http://anotherwebsite.com/id/32131">labelid/32131</a>
<a href="http://anotherwebsite_admin.com/id/32131">label</a> ==> <a href="http://anotherwebsite_admin.com/id/32131">label</a>
jenny_j
  • 21
  • 3
  • 1
    please, provide an example. your task sounds simple, but i can't see any string to be checked – gaussblurinc Aug 13 '12 at 14:18
  • You can do the same thing using one regular expression to check if the url starts with http and another regular expression, inside a php if-statement, to perform the replacement. It's easier to write and understand, and one complex pcre probably won't even be more efficient. – Vortexfive Aug 13 '12 at 14:24

2 Answers2

3

No need for the <, which is used to mark a look-back assertion, just use /^(?!http)\/?(\w+)\/node\/(\d)/i as a pattern, it matches /foo/bar/123, but not http://www.google.com/foo/bar/123

this question provides a nice overview that can help you with this

Community
  • 1
  • 1
Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
  • Well, thanks now the links with http are no more matched. Great ! Mhhh finally it seems that nothing is matched :-/ I don't have any more replacement. Well thanks I'll try to learn from the link you provided. – jenny_j Aug 13 '12 at 14:00
  • Well, `echo preg_replace('/^(?!http)\/?(\w+)\/node\/(\d)/i','id/$2','foo/node/123');` echo's 'id/123' for me. You can test and goof around [here](http://www.functions-online.com/preg_replace.html) until this pattern works for you. Also, and I don't like to nag about this, but if an answer was helpful or solved your problem, it's custom to upvote or accept it ;-) – Elias Van Ootegem Aug 13 '12 at 14:18
0

From the fine PHP manual - Assertions:

Note that the apparently similar pattern (?!foo)bar does not find an occurrence of "bar" that is preceded by something other than "foo"; it finds any occurrence of "bar" whatsoever, because the assertion (?!foo) is always TRUE when the next three characters are "bar". A lookbehind assertion is needed to achieve this effect.

such as:

$pattern = "/(?<!http:\/)\/(\w+)\/id\/(\d)/i";
Armali
  • 18,255
  • 14
  • 57
  • 171