regex code to select text of html file

Question

I need to select all text in HTML code (but not tags) to add tag to them

and i try to get opposite of <[\/!a-zA-Z0-9]+(\s+\w+(=(\w+|(\"|').*(\"|')))?)*\s*\/?> like this /^((?!<[\/!a-zA-Z0-9]+(\s+\w+(=(\w+|(\"|').*(\"|')))?)*\s*\/?>).)*$/ (link)

all i want to do is convert this:

<!DOCTYPE html>
<html>
<body>
<h1>Heading</h1>
<div><p>paragraph.</p></div>
<p>paragraph.</p>
</body>
</html>

to this:

<!DOCTYPE html>
<html>
<body>
<h1><span>Heading</span></h1>
<div><p><span>paragraph.</span></p></div>
<p><span>paragraph.</span></p>
</body>
</html>

please help.thx

[obligatory link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) (HTML is not a regular language and shouldn't be parsed with regular expressions) — Sam, May 02 '14 at 19:45
First fix any syntax problems that make the HTML not a valid subset of XML. Then parse it with an XML parser. Regex won't work. — developerwjk, May 02 '14 at 19:49
[Oh yes you *can* parse HTML with patterns](http://stackoverflow.com/a/4234491/471272). — tchrist, Jun 06 '14 at 22:36

score -1 · Answer 1 · answered May 02 '14 at 19:54

As Sam said, you can't read values from HTML reliably with regex. If you really wanted to read text from the HTML, then you would need to

Fix any syntax problems that make the HTML not a valid subset of XML.
Then parse it with an XML parser. Regex won't work.

But after reading your question again, I see you don't really want to read text. What you apparently want to do is add spans in certain places, which can be done like this. Read each line and apply:

$line = str_replace("<h1>", "<h1><span>", $line);
$line = str_replace("</h1>", "</span></h1>", $line);
$line = str_replace("<p>", "<p><span>", $line);
$line = str_replace("</p>", "</span></p>", $line);

Or course if the HTML wasn't well formed to begin with, the result also will not be.

regex code to select text of html file

1 Answers1