1

Hi :)

I try to get all text which does not match the regex, I use PHP

My regex is :

/(<[^>]+>)/is

I would get all the text inside the HTML tag

I tried : (?!(<[^>]+>)) before and a lot of stuff...

The input :

<html><head><title>Nice page</title></head>
<body>Hello World <a href=http://cyan.com title="un lien">Ceci est un lien</a> <a>sdfaf</a>
<br /><a href=http://www.riven.com> Et ca aussi <img src=wrong.image title="et encore ca">dd</a>
</body></html>

I want match all text inside html tag with regex,

Like :

" Nice page Hello World Ceci est un lien sdfaf Et ca aussi dd "

Thanks !! :)

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

3 Answers3

2

Use DOMDocument to do that:

$dom = new DOMDocument;
$dom->loadHTML($yourstring);
$xp = new DOMXPath($dom);

foreach($xp->query('//text()') as $textNode) {
    echo $textNode->nodeValue, PHP_EOL;
}
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

This regexp should select all the text content, part by part:

/>([^<]+)/g

Nikita Skrebets
  • 1,518
  • 2
  • 13
  • 19
0

There is an strip_tags() function that does it without further configurations

<?php
$input = '<html><head><title>Nice page</title></head><body>Hello World <a href=http://cyan.com title="un lien">Ceci est un lien</a><a>sdfaf</a><br /><a href=http://www.riven.com> Et ca aussi <img src=wrong.image title="et encore ca">dd</a><body></html>';
print( strip_tags($input) );