0
<table class="trailer">

------------------Begin---------------------
<tbody><tr>
<td class="newtrailer-text">
Trailer 2<br>
</td></tr>
<br>
<b>(Yahoo)</b><br>
<b>(High Definition)</b><br>
<a href="http://playlist.yahoo.com/makeplaylist.dll?sid=107193280&amp;sdm=web&amp;pt=rd">(1080p)</a><br>
<a href="http://playlist.yahoo.com/makeplaylist.dll?sid=107193279&amp;sdm=web&amp;pt=rd">(720p)</a><br>
<a href="http://playlist.yahoo.com/makeplaylist.dll?sid=107193272&amp;sdm=web&amp;pt=rd">(480p)</a><br>
<br>
<b>(Warner Bros.)</b><br>
<b>(High Definition)</b><br>
<a href="http://pdl.warnerbros.com/wbmovies/inception/trl_3/Inception_TRLR3_1080.mov">(1080p)</a><br>
<a href="http://pdl.warnerbros.com/wbmovies/inception/trl_3/Inception_TRLR3_720.mov">(720p)</a><br>
<a href="http://pdl.warnerbros.com/wbmovies/inception/trl_3/Inception_TRLR3_480.mov">(480p)</a>=
--------------END----------------

</tbody></table>

How would I get all the data between begin and end? I've tried the following with no results. Any help would be appreciated. Thanks.

$regex = '#<td class="newtrailer-text">([^"]+)</tbody></table>#si';
  • 1
    What exactly are you trying to achieve? What do you want to do with the data? (Since there are probably better/cleaner ways to achieve the same). – Oldskool Feb 12 '12 at 18:57
  • 3
    use the dom parser not regular expressions. –  Feb 12 '12 at 18:57

3 Answers3

2

Here's the canonical link for why you should use DOM to parse (X)HTML: The pony, he comes.

But here's the deal with your regex:

([^"]+) will only match everything up to the first occurrence of a double-quote ". Your regex specifies that the first double quote must occur immediately before the </tbody> tag or no match will be found.

Instead, try:

$regex = '#<td class="newtrailer-text">(.+)</tbody></table>#siU';

if (preg_match($regex, $str, $m)) {
  echo $m[1];
} else {
  echo 'No match';
}
Community
  • 1
  • 1
  • Thank you, I will look into DOM parser's – user1204679 Feb 12 '12 at 19:03
  • +1 for being way faster to explain.. typing with wireless mini-keyboard in front of my tv screen sucks ;-) – Kaii Feb 12 '12 at 19:03
  • but you should really make that regex ungreedy using U modifier! – Kaii Feb 12 '12 at 19:04
  • 1
    @Kaii You're absolutely right in principle. If there aren't multiple occurrences of ``, (as in the posted code) though, it makes no difference. All the same, vigilance is always welcome. Updating :) Oh, and I feel your pain on the mini-keyboard. I used to try to answer questions on my phone. Then the downvotes came for slowness and autocorrect. I don't do that anymore. –  Feb 12 '12 at 19:07
2
$regex = '#<td class="newtrailer-text">(.+)</tbody></table>#Usi';
Kaii
  • 20,122
  • 3
  • 38
  • 60
1

You can use non-greedy RrgEx like this:

if (preg_match_all('#------------------Begin---------------------(.*?)--------------END----------------#s', $str, $m) )
   print_r ( $m[1] );
anubhava
  • 761,203
  • 64
  • 569
  • 643