Get all text but not the regex match

Question

Hi :)

I try to get all text which does not match the regex, I use PHP

My regex is :

/(<[^>]+>)/is

I would get all the text inside the HTML tag

I tried : (?!(<[^>]+>)) before and a lot of stuff...

The input :

<html><head><title>Nice page</title></head>
<body>Hello World <a href=http://cyan.com title="un lien">Ceci est un lien</a> <a>sdfaf</a>
<br /><a href=http://www.riven.com> Et ca aussi <img src=wrong.image title="et encore ca">dd</a>
</body></html>

I want match all text inside html tag with regex,

Like :

" Nice page Hello World Ceci est un lien sdfaf Et ca aussi dd "

Thanks !! :)

Please put an example of what you try to achieve then it would be more easy to understand and help — Ntwobike, Oct 03 '18 at 13:32
I will never miss a chance to post this link: https://stackoverflow.com/a/1732454/3578036 — JustCarty, Oct 03 '18 at 13:55

score 2 · Answer 1 · answered Oct 03 '18 at 13:35

2

Use DOMDocument to do that:

$dom = new DOMDocument;
$dom->loadHTML($yourstring);
$xp = new DOMXPath($dom);

foreach($xp->query('//text()') as $textNode) {
    echo $textNode->nodeValue, PHP_EOL;
}

answered Oct 03 '18 at 13:35

Casimir et Hippolyte

88,009
5
94
125

Nikita Skrebets · Answer 2 · 2018-10-03T14:00:57.610

0

This regexp should select all the text content, part by part:

/>([^<]+)/g

edited Oct 03 '18 at 14:00

answered Oct 03 '18 at 13:55

Nikita Skrebets

1,518
2
13
19

score 0 · Answer 3 · answered Oct 03 '18 at 13:58

0

There is an strip_tags() function that does it without further configurations

<?php
$input = '<html><head><title>Nice page</title></head><body>Hello World <a href=http://cyan.com title="un lien">Ceci est un lien</a><a>sdfaf</a><br /><a href=http://www.riven.com> Et ca aussi <img src=wrong.image title="et encore ca">dd</a><body></html>';
print( strip_tags($input) );

answered Oct 03 '18 at 13:58

Luiz Surian

1
2

Yes, but the problem of `strip_tags` is that it concatenate all the text without a separator, that gives things like: `Nice pageHello World...` – Casimir et Hippolyte Oct 03 '18 at 14:39

Get all text but not the regex match

3 Answers3