0

The above code is working ok:

<?php

function clearPage($content, $class) {
$arr = array(
            '@^(.*?)<div class="'.$class.'">(.*?)</div>(.*?)$@i' => '<div class="'.$class.'">$2</div>'
            );

    return preg_replace(array_keys($arr), array_values($arr), $content);


}


$class = "something";
$content = "31xu1823y8<div class="something">Wanted</div>912u38u3"
$result = clearPage($content, $class);
echo $result;
?>

This outputs:

<div class="something">Wanted</div>

But I want to make the variable content, be the html code from a website page. So I change the last code to something like:

$class = "something";
$content = file_get_contents('index.php');
$result = clearPage($content, $class);
echo $result;

This outputs all the webpage! Why!?
Afonso Matos
  • 75
  • 1
  • 9
  • I'm guessing you'll find an answer in this SO question -> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – adeneo Jun 28 '13 at 18:20
  • And yes, if you read from `input.php`, you'll not get rendered HTML, but PHP source with only HTML snippets. (Regex problem: DOTALL flag.) – mario Jun 28 '13 at 18:22

1 Answers1

0

Using regex or string matching is the worst way to parse HTML.

You need to use the DOM: http://php.net/manual/en/book.dom.php

or a 3rd party dom library like so: http://simplehtmldom.sourceforge.net/

TravisO
  • 9,406
  • 4
  • 36
  • 44
  • Why not closevote to one of the hundreds of similar questions with likewise unspecific answers? – mario Jun 28 '13 at 18:19
  • I think my solution is ok, but the problem, I think is that I am not getting the html code from the page, so it doesn't replace anything. – Afonso Matos Jun 28 '13 at 18:20