A regex code doesn't works

Question

I'm trying to use regex to find FINDTHIS: bla bla the link: <a href="http://www.exemple.net/index.php?p…

my code is:

$delimiter = '#';
$startTag = '<a href="http://www.exemple.net/index.php?p…'; // shortened
$endTag = '/1/">';
$regex = $delimiter . preg_quote($startTag, $delimiter) 
. '(.*?)' 
. preg_quote($endTag, $delimiter) 
. $delimiter 
. 's';
preg_match($regex,$result,$matches);
$category = $matches;
print_r($category);

But I get nothing...

What is the problem? Thanks!

Also, please read http://stackoverflow.com/a/1732454/110707 -- there's a reason it's the highest-voted answer on SO. — Wooble, Apr 02 '12 at 17:08
Please provide more information, perhaps a little explanation, what else have you tried, etc... I'd normally -1 this (especially since `blah blah` makes it seem like you put no effort into the question), but won't in this case as you're new(ish) to SO. I do suggest you edit the Q to add more detail to prevent others -1'ing — Basic, Apr 02 '12 at 17:09
`if (!preg_match($regex, $result, $matches)) { die("No matches") } else { var_dump($matches) }`. Since you get nothing, obviously the regex is finding nothing and you need to fix the regex. — Marc B, Apr 02 '12 at 17:56

score 2 · Answer 1 · 2012-04-03T22:55:39.623

Not sure if I'm reading this right, but something like below is comparable to what I think your trying to do. Caveat being that regex and html probably don't mix. But for html text chunks it should be fine.

When looking to find a specific att-val within a tag, I tend to favor a lookahead that locates it anywhere within the tag, and safe enough that it won't overrun boundries.

Used preg_match_all() as an example. Test case here http://ideone.com/oerbc
(fixed relative backref, should be -2)

$html = '
  <a href="http://www.exemple.net/index.php?p[some stuff to find]/1/">
  <a href=\'http://www.exemple.net/index.php?p[more stuff to find]/1/ \'>
'; 

$ref_txtstart = 'http://www.exemple.net/index.php?p';
$ref_txtend   = '/1/';

$regex =
'~
<a 
  (?=\s) 
  (?= (?:[^>"\']|"[^"]*"|\'[^\']*\')*? (?<=\s) href \s*=
      (?>
         \s* ([\'"]) \s*
         ' . preg_quote($ref_txtstart) . '
         (?<core>(?:(?!\g{-2}).)*)
         ' . preg_quote($ref_txtend) . '
         \s* \g{-2}
      )
  )
  \s+ (?:".*?"|\'.*?\'|[^>]*?)+ 
>~xs
';

echo ("$regex\n");
preg_match_all( $regex, $html, $matches, PREG_SET_ORDER );
foreach ($matches as $val) {
   echo( "matched = $val[0]\ncore    = $val[core]\n\n"  );
}
?>

Output

~
<a 
  (?=\s) 
  (?= (?:[^>"']|"[^"]*"|'[^']*')*? (?<=\s) href \s*=
      (?>
         \s* (['"]) \s*
         http\://www\.exemple\.net/index\.php\?p
         (?<core>(?:(?!\g{-2}).)*)
         /1/
         \s* \g{-2}
      )
  )
  \s+ (?:".*?"|'.*?'|[^>]*?)+ 
>~xs

matched = <a href="http://www.exemple.net/index.php?p[some stuff to find]/1/">
core    = [some stuff to find]

matched = <a href='http://www.exemple.net/index.php?p[more stuff to find]/1/ '>
core    = [more stuff to find]

also

This can be extended to include unquoted values by using a branch reset and
changing the named capture buffer to the fixed index of the capture buffer in question.

So $val[core] becomes $val[2]. Example is here http://ideone.com/IHHLg

Extended regex

$regex =
'~
<a 
  (?=\s) 
  (?= (?:[^>"\']|"[^"]*"|\'[^\']*\')*? (?<=\s) href \s*=
    (?|
        (?>
           \s* ([\'"]) \s*
           ' . preg_quote($ref_txtstart) . ' ((?:(?!\g{-2}).)*) ' . preg_quote($ref_txtend) . '
           \s* \g{-2}
        )
      |
        (?> 
           (?!\s*[\'"]) \s* ()
           ' . preg_quote($ref_txtstart) . ' ([^\s>]*) ' . preg_quote($ref_txtend) . '
           (?=\s|>)
        )
    )
  )
  \s+ (?:".*?"|\'.*?\'|[^>]*?)+ 
>~xs
';

A regex code doesn't works

1 Answers1