2

I am being struggling with this regex expression long time but i cannot find any fix. I used the javascript based tools to test and write the expression. When putting into the php page and matching with preg the results are different.

/(<img\b src=)"([^"]+)"(.* class=".*colorme(?:.|[^"]*)"[^>]+>)/

And the examples to test are here, the first should not be matched. THIS ALL WORKS with javascript, but not with php, only the classes class="colorme" will be matched. Am I missing something ?

<img src="http://test.jpg" class="then" border="0" width="123" height="83">

<img src="test.jpg" border="0" alt="well watch picture" alt="tersts" class="really colorme" width="228" height="138">

<img src="test.jpeg" class="colorme then" border="0" width="123" height="83">

<img src="test" border="0" width="123" height="83" class="pic colorme then" with="me">

<img src="tests" border="0" class="colorme" width="123" height="83">
Risto Novik
  • 8,199
  • 9
  • 50
  • 66
  • 6
    Using regex to find HTML elements (of this complexity): **Bad idea**. Use an HTML parser and XPath! Even in JavaScript you can leverage DOM. – Felix Kling Jul 11 '11 at 14:05
  • JavaScript and PHP have different variants of regular expressions. (There are *lots* of different variants of regular expressions.) So it's not surprising that, having got it working in JavaScript, it's not working in PHP -- the syntax is (slightly) different. Separately, since HTML is not a regular language, you cannot use regular expressions, on their own, to reliably process it. You can come close, and perhaps you can make what you're trying to do work well enough for a limited use-case specific to the problem you're solving, but beware. – T.J. Crowder Jul 11 '11 at 14:06
  • I can recommend regex buddy: http://www.regexbuddy.com/ - it can help you identify and test correct expressions and when it will be ready - you can see a correct one for your selected language. – Andron Jul 11 '11 at 14:29
  • possible duplicate of [How to get href from anchor tag with particular class](http://stackoverflow.com/questions/3232219/how-to-get-href-from-anchor-tag-with-particular-class) – Gordon Jul 11 '11 at 14:40
  • possible duplicate of [DOMDocument need to search for an element that has attribute class](http://stackoverflow.com/questions/3443701/domdocument-need-to-search-for-an-element-that-has-attribute-class-something) – Gordon Jul 11 '11 at 14:42
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Jul 11 '11 at 14:42

2 Answers2

1

With DOM & no fancy expressions...

<?php 
$doc =<<<DEMO
<img src="http://test.jpg" class="then" border="0" width="123" height="83">
<img src="test.jpg" border="0" alt="well watch picture" alt="tersts" class="really colorme" width="228" height="138">
<img src="test.jpeg" class="colorme then" border="0" width="123" height="83">
<img src="test" border="0" width="123" height="83" class="pic colorme then" with="me">
<img src="tests" border="0" class="colorme" width="123" height="83">
DEMO;

$xml = new DOMDocument();
//Or you could use for locally saved files
//@$xml->loadHTMLFile('savedfile.html');
@$xml->loadHTML($doc);
foreach($xml->getElementsByTagName('img') as $image) {
    if(strstr($image->getAttribute('class'),'colorme')==true){
        $images[] = $image->getAttribute('src');
    }
}
print_r($images);
?>

Outputs:

Array (
    [0] => test.jpg
    [1] => test.jpeg
    [2] => test
    [3] => tests )
Lawrence Cherone
  • 46,049
  • 7
  • 62
  • 106
0

In general, no two regex languages are identical, and there are vast differences between how Javascript and PHP handle them, such that you can't really copy and paste one into the other. I honestly think that using a DOM Document object with something like XPath would be vastly easier, but for your purposes, regex is absolutely fine. If you are trying to only match one tag or so, you can always make a valid regex expression, it's only when you start trying to do more than that that you start to see regex's shortcomings in the field, which most people seem to forget.

So, in conclusion, you should use an HTML parser, but you can use a regular expression. There's no law either way. I would suggest you use DOM and XPath for this, but if you want to do it as a regex, look at the second answer (the one with a score of 300+) on this page

Regular expression pattern not matching anywhere in string

Community
  • 1
  • 1
shmeeps
  • 7,725
  • 2
  • 27
  • 35