6

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.

$string = "http://example.com/foo/12/jacket Input/Output";
    match------------------------^--^

The length of the words between slashes should not matter.

Regex: (?<=.com\/\w)(\/) results:

$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";   
    matches--------------------^

Regex: (?<=\/\w)(\/) results:

$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
    matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
    matches--------------------^-^--------------^                    

Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.

Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?

NOTE: tagged with PHP because the regex should work in any of the preg_* functions.

Jay Blanchard
  • 34,243
  • 16
  • 77
  • 119
  • 2
    The`preg_match` function returns one match. You say you need to match all characters there are after some pattern. You should use `preg_match_all`. – Wiktor Stribiżew Feb 11 '16 at 17:58
  • I still have an impression it is an XY problem. What are you trying to achieve? Why match those slashes? You could url_parse the URL, and then do whatever you please. Explode, e.g. – Wiktor Stribiżew Feb 11 '16 at 18:29
  • No, it isn't an XY problem @WiktorStribiżew as the regex should work in *any* of the `preg_*` functions. – Jay Blanchard Feb 11 '16 at 18:42

3 Answers3

3

Use \K here along with \G.grab the groups.

^.*?\.com\/\w+\K|\G(\/)\w+\K

See demo.

https://regex101.com/r/aT3kG2/6

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 

preg_match_all($re, $str, $matches);

Replace

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 
$subst = "|"; 

$result = preg_replace($re, $subst, $str);
vks
  • 67,027
  • 10
  • 91
  • 124
  • For whatever reason this is not working in the context of `preg_match()` – Jay Blanchard Feb 11 '16 at 17:52
  • OK - `preg_match_all()` works but other `preg....` functions fail - like `preg_replace()`. – Jay Blanchard Feb 11 '16 at 18:03
  • @JayBlanchard the only problem with replace is one extra replacement will happend at the end....guess that will have to be dealt separately – vks Feb 11 '16 at 18:06
  • I got this as the return `http://example.com/foo|/12/jacket Input/Output`using your code. Note that none of the slashes have been removed, but a pipe has been added. Even in your substitution example the slashes are left in place. I *really* appreciate your help with this, it seems that we're both beating our heads against the wall. One of the +1's is mine. – Jay Blanchard Feb 11 '16 at 18:08
  • @JayBlanchard we can do it like this ...u just need to remove an extra `|` later. https://regex101.com/r/aT3kG2/8 – vks Feb 11 '16 at 18:16
  • Can you also copy those regex's into your answer. They may not stay on regex101 for ever! Which would make your answers useless to others – RiggsFolly Feb 11 '16 at 18:37
  • That last example you provded added spaces and an extra pipe @vks `http://example.com/foo||12|jacket|wow` – Jay Blanchard Feb 11 '16 at 18:46
3

If you want to use preg_replace then this regex should work:

$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output

Thus replacing each / by a | after first / that appears after starting .com.

Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Another \G and \K based idea.

$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';

See demo at regex101

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46