0

Hi I'm just trying to get a hang of regular expressions, i have being trying to extract content from this website but i reckon i have a problem with my regexp, as i cannot add anything to the array. Can anyone point me in the right direction, I reckon its just something small.

Thanks

<?php   
    $f1 = fopen("http://www.irishexaminer.com/","r");
    $document = fread($f1,100000);
    fclose($f1);
    $regexp = "%<p>(.+)</p><p>%";
    preg_match($regexp,$document,$getHeading);  
    echo "<br>" . $getHeading[1];
    echo '<pre>';
    print_r($getHeading);
    echo '</pre>';
?>
JLRishe
  • 99,490
  • 19
  • 131
  • 169

1 Answers1

1

THERE is no excuse for white space in the closing tag of p in your case.

<p> THERE is no excuse for loyalist violence on the streets of Belfast.<p /><p>

Regex to match

%<p>(.+)</\s*p><p>%

It would take a while to make a regex resilient enough for HTML. Take Frankies advice too. Vest your effort into something less prone to failure. You can use PHP HTML Tidy

Robert Cutajar
  • 3,181
  • 1
  • 30
  • 42