trim/delete Everything after DIV with ID

Question

In the testing environment $html is 20 to 30 lines or more of HTML is created by a CURL (scrape) query to another page/site, but for simplicity in the question i reduced it to this simple example:

I need to echo the DIV with ID "keepthis" and all its content with HTML structure intact, but delete everything before it and after it. The DIV with ID "deletethis" will always have that ID. I have looked at multiple posts involving substr / explode / trim but i cannot find or get to work a method that deletes everything TO THE RIGHT in $html starting from position 0 of

that div(deletethis) is not located at a fixed # of characters into the code, I am able to get the delete all before DIV(keepthis) to work, just not the other side. Any help would be appreciated.

$html = '<h1>hello world</h1><div id="keepthis"> Sample content</div><div id="deletethis">a bunch of other dynamic html here</div>';
$x = substr($html, strpos($html, '<div id="keepthis">')); //cleans up the BEFORE code
echo $x;

Why not parse the code into a DOM document? What regular expression patterns have you tried so far? How are they failing? — crush, Jul 31 '14 at 17:14
Using DOMDocument and DOMXpath you can certainly do that ... [hint](http://stackoverflow.com/questions/5126967/extract-dom-elements-from-string-in-php) — Ko2r, Jul 31 '14 at 17:20
@AvinashRaj That honestly shouldn't matter. It's just bad practice. It's not what it was designed for... — War10ck, Jul 31 '14 at 17:27
@Ko2r I looked at the hint/link you provided - my short answer is its exceeding my working knowledge for the moment. I'll try to work with that further and see if i can grasp it enough to use it. — DMSJax, Jul 31 '14 at 17:42

score 0 · Answer 1 · answered Jul 31 '14 at 17:54

0

So based on the link try this :

$html = '<h1>hello world</h1><div id="keepthis"> Sample content</div><div id="deletethis">a bunch of other dynamic html here</div>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$result = $xpath->query('//div[@id="keepthis"]');
if ($result->length > 0) {
    var_dump($result->item(0)->nodeValue);
}

Warning : The node value will not output tags but you can iterate through childs of $result->item(0) to get them

answered Jul 31 '14 at 17:54

Ko2r

1,541
1
11
24

Thank you, let me try to provide a better example: [link](http://mysitedesign.net/test2.php) <-- everything after the final HVAC unit should be removed. – DMSJax Jul 31 '14 at 18:17
`$x=""; $url = "http://www.trane.com/residential/en/products/heating-and-cooling/air-conditioners.html"; $curl = curl_init($url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE); $output = curl_exec($curl); curl_close($curl); $DOM = new DOMDocument; libxml_use_internal_errors(true); $DOM->validateOnParse = true; @$DOM->loadHTML($output); libxml_use_internal_errors(false); $DOM->normalizeDocument(); $html = $DOM->saveHTML(); $x = substr($html, strpos($html, '
')); $x = str_replace('
– DMSJax Jul 31 '14 at 18:18

score -2 · Answer 2 · edited Jul 31 '14 at 17:21

-2

string rtrim ( string $str [, string $character_mask ] )

This function returns a string with whitespace stripped from the end of str.

Without the second parameter, rtrim() will strip these characters:

edited Jul 31 '14 at 17:21

War10ck

12,387
7
41
54

answered Jul 31 '14 at 17:20

Sagar.P.Waghmare

11
3

I'm not sure how `rtrim()` will help here. The function operates on characters, and *not* strings. So `rtrim($str, '');` will not do what you think it will. – Amal Murali Jul 31 '14 at 17:23

trim/delete Everything after DIV with ID

2 Answers2