-2

How to remove all from page except text inside <p> tag?

Page:

  This is text.
  <div class="text">This is text in 'div' tag</div>
  <p>This is text in 'p' tag</p>

Expected result:

This is text in 'p' tag

Greetings.

w00d
  • 5,416
  • 12
  • 53
  • 85
J. Doe
  • 1

2 Answers2

0

Basically, you'll have to parse the markup. PHP comes with a good parser in the form of the DOMDocument class, so that's really quite easy:

$dom = new DOMDocument;
$dom->loadHTML($htmlString);

Next, get all p tags:

$paragraphs = $dom->getElementsByTagName('p');

This method returns a DOMNodeList object, which implements the Traversable interface, so you can use it as an array of DOMNode instances (DOMElement in this case):

$first = $paragraphs->item(0);//or $paragraphs[0] even
foreach ($paragraphs as $p) {
    echo $p->textContent;//echo the inner text
}

If you only want the paragraph elements that do not contain child elements, then you can easily check that:

foreach ($paragraphs as $p) {
    if (!$p->hasChildNodes()) {
        echo $p->textContent; // or $p->nodeValue
    }
}

A closely related answer with some more links/info: How to split an HTML string into chunks in PHP?

Community
  • 1
  • 1
Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
0

You can easily do this with the native php strip_tags function like so:

strip_tags("<p>This is text in 'p' tag</p>"); 

Which will return as you expected, "This is text in 'p' tag". NOTE: this is only useful when you have an outer-container div, and you use a little bit of dirty RegExp in order to strip not only the P, but the whole tags the user expected (ex. the div tag). This function has one argument, and a second optional argument. The first one is the string that you are stripping the tags from, and the second one specifies allowable tags that won't be stripped as a string. These tags will not be removed in the process. For more information on the strip_tags function click here.

I hope you got the idea :)
Mystical
  • 2,505
  • 2
  • 24
  • 43
  • You're not addressing the first part of the question: how to extract the `p` tags from the string. There's a div tag in the OP's example code, for example. How would you process those? – Elias Van Ootegem Aug 24 '16 at 13:53
  • Like this: strip_tags("
    This is text in 'div' tag
    ")->replace('class="text"', "");
    – Mystical Aug 24 '16 at 14:58
  • My point still stands: how do you extract the `div` string from the source string? You simply can't do that without parsing the DOM. Markup needs to be parsed in order for you to work out where the `p` tags sit, whether or not there are tags _inside_ that paragraph tag, and process them correctly. Your answer assumes, more or less, the input string will always be in the format: _"text contents"_, which is, clearly, isn't the case. BTW: `strip_tags($string)->replace("attribute='value'", '');` is invalid syntax, and pointless, strip tags will clear the attributes too – Elias Van Ootegem Aug 24 '16 at 15:41
  • Good point. I didn't completely understand what you meant in my previous comment. But it works on some occasions when the user has a container div, and uses a bit of dirty RegExp. BTW, sorry about the misguided replace. It should be `str_replace(find,replace,string,count)`, right? – Mystical Aug 24 '16 at 18:32
  • Not quite: Regex and markup don't work, as _"explained"_ in this classic SO answer: http://stackoverflow.com/a/1732454/1230836 – Elias Van Ootegem Aug 25 '16 at 09:53
  • Ah, I see. Thanks for the link! So, for the most efficient and correct answer as expected, you have to use the DOMDocument class and parse the DOM. Although, even when parsing the DOM, instead of using `$node -> textContent`, it is possible to use the `strip_tags(string, count)` function. As seen here: http://stackoverflow.com/questions/39076511/how-to-split-an-html-string-into-chunks-in-php?noredirect=1&lq=1 – Mystical Aug 25 '16 at 10:06
  • Yes, `strip_tags` will work on `nodeValue`, but why bother calling an extra function, if the `DOMNode` already contains the text content without tags under `$node->textContent` in the first place? – Elias Van Ootegem Aug 25 '16 at 11:43