4

I have a some html paragraphs and I want to wrap every word in . Now I have

$paragraph = "This is a paragraph.";
$contents = explode(' ', $paragraph);
$i = 0;
$span_content = '';
foreach ($contents as $c){
    $span_content .= '<span>'.$c.'</span> ';
    $i++;
}
$result = $span_content;

The above codes work just fine for normal cases, but sometimes the $paragraph would contains some html tags, for example

$paragraph = "This is an image: <img src='/img.jpeg' /> This is a <a href='/abc.htm'/>Link</a>'";

How can I not wrap "words" inside html tag so that the htmnl tags still works but have the other words wrapped in spans? Thanks a lot!

Murad Hasan
  • 9,565
  • 2
  • 21
  • 42
user2335065
  • 2,337
  • 3
  • 31
  • 54
  • I guess you could check every `$c` for the presence of '<' if so do nothing, go the next `$c` and keep doing nothing until you find a part with `>`. After that continue adding `span`. This is a scenario that will easily error though. – RST May 03 '16 at 09:13
  • If you are parsing html use an html parser. – mickmackusa Jul 14 '21 at 09:57

2 Answers2

2

Some (*SKIP)(*FAIL) mechanism?

<?php
$content = "This is an image: <img src='/img.jpeg' /> ";
$content .= "This is a <a href='/abc.htm'/>Link</a>";
$regex = '~<[^>]+>(*SKIP)(*FAIL)|\b\w+\b~';

$wrapped_content = preg_replace($regex, "<span>\\0</span>", $content);
echo $wrapped_content;

See a demo on ideone.com as well as on regex101.com.


To leave out the Link as well, you could go for:

(?:<[^>]+>     # same pattern as above
|              # or
(?<=>)\w+(?=<) # lookarounds with a word
)
(*SKIP)(*FAIL) # all of these alternatives shall fail
|
(\b\w+\b)

See a demo for this on on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • 1
    Nice one but they both fail if quotes or special chars are used inside the text. Try with Thi's and or Thi’s http://ideone.com/KulE6h – Benn Nov 18 '16 at 11:11
0

The short version is you really do not want to attempt this.

The longer version: If you are dealing with HTML then you need an HTML parser. You can't use regexes. But where it becomes even more messy is that you are not starting with HTML, but with an HTML fragment (which may, or may not be well-formed. It might work if Hence you need to use an HTML praser to identify the non-HTML extents, separate them out and feed them into a secondary parser (which might well use regexes) for translation, then replace the translted content back into the DOM before serializing the document.

Community
  • 1
  • 1
symcbean
  • 47,736
  • 6
  • 59
  • 94