1

I have a big HTML text to parse with a PHP script. I need to find (and print on screen) the text between:

<span style="color:COLOR">#</span>

and

<span class="close">#</span>

where COLOR can be "red", "green" or "blue".

I wrote THREE loops, each for any color, so my code at the moment is the following:

preg_match_all("'red\">#</span>(.*?)<span class=\"close\">#</span>'si", $text, $match);
foreach($match[1] as $val) {
     echo $val;
}
preg_match_all("'green\">#</span>(.*?)<span class=\"close\">#</span>'si", $text, $match);
foreach($match[1] as $val) {
     echo $val;
}
preg_match_all("'blue\">#</span>(.*?)<span class=\"close\">#</span>'si", $text, $match);
foreach($match[1] as $val) {
     echo $val;
}

Everything works fine but I have two issues with this:

  1. This way I find all the portions of text between red tags, THEN all the portions of text between green tags and FINALLY all the portions of text between blue tags (but I want to find them in the exact order they appear in the text).
  2. All those repeated code... it makes my heart hurt.

So I needed to find a way to search for all the portions of text using an OR condition.

I wrote this piece of code then:

$patterns = array(
    'green\">#<\/span>(.*?)<span class=\"close\">#<\/span>',
    'red\">#<\/span>(.*?)<span class=\"close\">#<\/span>',
    'blue\">#<\/span>(.*?)<span class=\"close\">#<\/span>'
);

$rule= '/(' .implode('|', $patterns) .')/i'; 

$text = 'Lorem ipsum dolor sit amet, <span style="color:red">#</span>consectetur adipiscing elit<span class="close">#</span>. 
Vestibulum ante lectus, <span style="color:green">#</span>pellentesque ac accumsan sit amet, posuere tempor<span class="close">#</span> ligula.';

preg_match_all($rule, $text, $match);
foreach($match[1] as $val) {
    echo "<pre>".$val."</pre><br />";
}

What I expect to find printed on screen:

consectetur adipiscing elit
pellentesque ac accumsan sit amet, posuere tempor

What I actually get:

red">#consectetur adipiscing elit#
green">#pellentesque ac accumsan sit amet, posuere tempor#

So, I'm obviously doing something wrong with the pattern but I can't find a way to solve this. Any help?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
  • Obligatory: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – deceze Dec 19 '12 at 11:47
  • Very _"funny"_ answer but I did parse my HTML with regex. –  Apr 17 '13 at 11:56

2 Answers2

1

You can use alternation for this

(?:red|green|blue)\">#</span>(.*?)<span class=\"close\">#</span>
Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198
0

does it have to be 'red', 'green', 'blue'? can't it be 'color:ANYTHING'?

 preg_match_all("'color:.+?\">#</span>(.*?)<span class=\"close\">#</span>'si", $text, $match);
Naryl
  • 1,878
  • 1
  • 10
  • 12