DOMDocument : how to get inner HTML as Strings separated by line-breaks?

Question

<blockquote>
 <p>
   2 1/2 cups sweet cherries, pitted<br>
   1 tablespoon cornstarch <br>
   1/4 cup fine-grain natural cane sugar
 </p>
</blockquote>

hi , i want to get the text inside 'p' tag . you see there are three different line and i want to print them separately after adding some extra text with each line . here is my code block

    $tags = $dom->getElementsByTagName('blockquote');
    foreach($tags as $tag)
    {
        $datas = $tag->getElementsByTagName('p');
        foreach($datas as $data)
        {
            $line = $data->nodeValue;
            echo $line;
        }
    }

main problem is $line contains the full text inside 'p' tag including 'br' tag . how can i separate the three lines to treat them respectively ??

thanks in advance.

Related to the previous innerHTML refrence in the title: [innerHTML in PHP's DomDocument?](http://stackoverflow.com/q/2087103/367456) — hakre, Jun 23 '13 at 18:39

score 2 · Accepted Answer · edited May 23 '17 at 11:50

2

You can do that with XPath. All you have to do is query the text nodes. No need to explode or something like that:

$dom = new DOMDocument;
$dom->loadHtml($html);
$xp = new DOMXPath($dom);
foreach ($xp->query('/html/body/blockquote/p/text()') as $textNode) {
    echo "\n<li>", trim($textNode->textContent);
}

The non-XPath alternative would be to iterate the children of the P tag and only output them when they are DOMText nodes:

$dom = new DOMDocument;
$dom->loadHtml($html);
foreach ($dom->getElementsByTagName('p')->item(0)->childNodes as $pChild) {
    if ($pChild->nodeType === XML_TEXT_NODE) {
        echo "\n<li>", trim($pChild->textContent);
    }
}

Both will output (demo)

<li>2 1/2 cups sweet cherries, pitted
<li>1 tablespoon cornstarch
<li>1/4 cup fine-grain natural cane sugar

Also see DOMDocument in php for an explanation of the node concept. It's crucial to understand when working with DOM.

edited May 23 '17 at 11:50

Community

1
1

answered Aug 28 '11 at 17:52

Gordon

312,688
75
539
559

i am trying to access this link [link](http://www.101cookbooks.com/archives/cherry-cobbler-recipe.html) using this code [link](http://codepad.viper-7.com/9eEEkZ) but it is not working . would you please check that ?? – Quazi Marufur Rahman Aug 29 '11 at 06:32
@qmaruf The XPath in my example is a direct path to the first text nodes that are direct children of a p element that is a direct child of a blockquote element that is a direct child of the body element. You have to adjust it for your website. See http://codepad.viper-7.com/xk4H4S – Gordon Aug 29 '11 at 07:01
thanks ... i have adjusted is . this is the xpath found from firebug: /html/body/div/div[3]/div/div[3]/blockquote/p[2]/text() – Quazi Marufur Rahman Aug 29 '11 at 07:24
@qmaruf do not use the XPath returned by Firebug. If there is scripts on the page that alter the DOM, the XPath might be inaccurate. Also, browsers will add implied elements where necessary. – Gordon Aug 29 '11 at 07:32

score 1 · Answer 2 · answered Aug 28 '11 at 17:10

1

You can use

$lines = explode('<br>', $data->nodeValue);

answered Aug 28 '11 at 17:10

Andrej

7,474
1
19
21

score 0 · Answer 3 · answered Aug 28 '11 at 17:11

0

here is a solution in javascript syntax

 var tempArray = $line.split("<br>");  

echo $line[0]
echo $line[1]
echo $line[2]

answered Aug 28 '11 at 17:11

dov.amir

11,489
7
45
51

score -2 · Answer 4 · answered Aug 28 '11 at 17:11

You can use the php explode function like this. (assuming each line in your <p> tag ends with <br>)

$tags = $dom->getElementsByTagName('blockquote');
foreach($tags as $tag)
{
    $datas = $tag->getElementsByTagName('p');
    foreach($datas as $data)
    {
        $contents = $data->nodeValue;
        $lines = explode('<br>',$contents);
        foreach($lines as $line) {
            echo $line;
        }
    }
}

DOMDocument : how to get inner HTML as Strings separated by line-breaks?

4 Answers4

Linked

Related