0

I need to select all text in HTML code (but not tags) to add tag to them

and i try to get opposite of <[\/!a-zA-Z0-9]+(\s+\w+(=(\w+|(\"|').*(\"|')))?)*\s*\/?> like this /^((?!<[\/!a-zA-Z0-9]+(\s+\w+(=(\w+|(\"|').*(\"|')))?)*\s*\/?>).)*$/ (link)

all i want to do is convert this:

<!DOCTYPE html>
<html>
<body>
<h1>Heading</h1>
<div><p>paragraph.</p></div>
<p>paragraph.</p>
</body>
</html>

to this:

<!DOCTYPE html>
<html>
<body>
<h1><span>Heading</span></h1>
<div><p><span>paragraph.</span></p></div>
<p><span>paragraph.</span></p>
</body>
</html>

please help.thx

Community
  • 1
  • 1
hooman naghiee
  • 103
  • 2
  • 8
  • 3
    [obligatory link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) (HTML is not a regular language and shouldn't be parsed with regular expressions) – Sam May 02 '14 at 19:45
  • 2
    First fix any syntax problems that make the HTML not a valid subset of XML. Then parse it with an XML parser. Regex won't work. – developerwjk May 02 '14 at 19:49
  • [Oh yes you *can* parse HTML with patterns](http://stackoverflow.com/a/4234491/471272). – tchrist Jun 06 '14 at 22:36

1 Answers1

-1

As Sam said, you can't read values from HTML reliably with regex. If you really wanted to read text from the HTML, then you would need to

  1. Fix any syntax problems that make the HTML not a valid subset of XML.
  2. Then parse it with an XML parser. Regex won't work.

But after reading your question again, I see you don't really want to read text. What you apparently want to do is add spans in certain places, which can be done like this. Read each line and apply:

$line = str_replace("<h1>", "<h1><span>", $line);
$line = str_replace("</h1>", "</span></h1>", $line);
$line = str_replace("<p>", "<p><span>", $line);
$line = str_replace("</p>", "</span></p>", $line);

Or course if the HTML wasn't well formed to begin with, the result also will not be.

developerwjk
  • 8,619
  • 2
  • 17
  • 33