2

In the testing environment $html is 20 to 30 lines or more of HTML is created by a CURL (scrape) query to another page/site, but for simplicity in the question i reduced it to this simple example:

I need to echo the DIV with ID "keepthis" and all its content with HTML structure intact, but delete everything before it and after it. The DIV with ID "deletethis" will always have that ID. I have looked at multiple posts involving substr / explode / trim but i cannot find or get to work a method that deletes everything TO THE RIGHT in $html starting from position 0 of

that div(deletethis) is not located at a fixed # of characters into the code, I am able to get the delete all before DIV(keepthis) to work, just not the other side. Any help would be appreciated.

$html = '<h1>hello world</h1><div id="keepthis"> Sample content</div><div id="deletethis">a bunch of other dynamic html here</div>';
$x = substr($html, strpos($html, '<div id="keepthis">')); //cleans up the BEFORE code
echo $x;
DMSJax
  • 1,709
  • 4
  • 22
  • 35
  • 2
    Why not parse the code into a DOM document? What regular expression patterns have you tried so far? How are they failing? – crush Jul 31 '14 at 17:14
  • Using DOMDocument and DOMXpath you can certainly do that ... [hint](http://stackoverflow.com/questions/5126967/extract-dom-elements-from-string-in-php) – Ko2r Jul 31 '14 at 17:20
  • @DMS you mean this http://regex101.com/r/jM2lE0/2 ? – Avinash Raj Jul 31 '14 at 17:23
  • @AvinashRaj Regex ***should not*** be used to parse HTML... – War10ck Jul 31 '14 at 17:24
  • @War10ck it won't be a big problem for shorter html code. – Avinash Raj Jul 31 '14 at 17:26
  • @AvinashRaj That honestly shouldn't matter. It's just bad practice. It's not what it was designed for... – War10ck Jul 31 '14 at 17:27
  • @Ko2r I looked at the hint/link you provided - my short answer is its exceeding my working knowledge for the moment. I'll try to work with that further and see if i can grasp it enough to use it. – DMSJax Jul 31 '14 at 17:42
  • @DMSJax Done for you and tested it works – Ko2r Jul 31 '14 at 17:56

2 Answers2

0

So based on the link try this :

$html = '<h1>hello world</h1><div id="keepthis"> Sample content</div><div id="deletethis">a bunch of other dynamic html here</div>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$result = $xpath->query('//div[@id="keepthis"]');
if ($result->length > 0) {
    var_dump($result->item(0)->nodeValue);
}

Warning : The node value will not output tags but you can iterate through childs of $result->item(0) to get them

Ko2r
  • 1,541
  • 1
  • 11
  • 24
  • Thank you, let me try to provide a better example: [link](http://mysitedesign.net/test2.php) <-- everything after the final HVAC unit should be removed. – DMSJax Jul 31 '14 at 18:17
  • `$x=""; $url = "http://www.trane.com/residential/en/products/heating-and-cooling/air-conditioners.html"; $curl = curl_init($url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE); $output = curl_exec($curl); curl_close($curl); $DOM = new DOMDocument; libxml_use_internal_errors(true); $DOM->validateOnParse = true; @$DOM->loadHTML($output); libxml_use_internal_errors(false); $DOM->normalizeDocument(); $html = $DOM->saveHTML(); $x = substr($html, strpos($html, '
    ')); $x = str_replace('
    – DMSJax Jul 31 '14 at 18:18
-2
string rtrim ( string $str [, string $character_mask ] )

This function returns a string with whitespace stripped from the end of str.

Without the second parameter, rtrim() will strip these characters:

War10ck
  • 12,387
  • 7
  • 41
  • 54
  • I'm not sure how `rtrim()` will help here. The function operates on characters, and *not* strings. So `rtrim($str, '');` will not do what you think it will. – Amal Murali Jul 31 '14 at 17:23