0

What is wrong with the regular expression below?

$content = '
<span style="text-decoration: underline;">cultural</span> 
<span style="text-decoration: line-through;">heart</span>
<span style="font-family: " lang="EN-US">May</span>
';

$regex = '/<span style=\"text\-decoration\: underline\;\">(.*?)<\/span>/is';
if (!preg_match($regex,$content))
{

    $content = preg_replace("/<span.*?\>(.*?)<\/span>/is", "$1", $content);
}

What I want to do is to remove all span except the span has either,

style="text-decoration: underline; or

style="text-decoration: line-through;

How can I fix it?

Run
  • 54,938
  • 169
  • 450
  • 748
  • The correct approach would be to use a [DOMDocument](http://www.php.net/manual/en/class.domdocument.php) (HTML parser) and avoid regex like the plague. – Brad Christie Jul 22 '11 at 17:14
  • Please read my issue here `http://stackoverflow.com/questions/6793224/dom-parser-remove-certain-attributes-only`. Thanks. – Run Jul 22 '11 at 17:16
  • So you didn't want to heed the advice to access the `style` attribute and parse it? It's a much better approach than using regex on HTML. – Brad Christie Jul 22 '11 at 17:21

1 Answers1

1

The DOM approach:

<?php
  $content = '<span style="text-decoration: underline;">cultural</span>'
           . '<span style="text-decoration: line-through;">heart</span>'
           . '<span style="font-family: " lang="EN-US">May</span>';

  $dom = new DOMDocument();
  $dom->loadHTML($content);

  // grab every span, then iterate over it. Because we may be removing
  // spans, we reference the ->length property off the DOMNode and use an
  // iterator ($s)
  $spans = $dom->getElementsByTagName('span');
  for ($s = 0; $s < $spans->length; $s++)
  {
    // grab the current span element
    $span = $spans->item($s);

    // iterate over the attributes looking for style tags
    $attributes = $span->attributes->length;
    for ($a = 0; $a < $attributes; $a++)
    {
      // grab the attribute, check if it's a style tag.
      // if is is, also check if it has the text-decoration applied
      $attribute = $span->attributes->item($a);
      if ($attribute->name == 'style' && !preg_match('/text-decoration:\s*(?:underline|line-through);/i', $attribute->value))
      {
        // found it. Now, referencing its parent, we want to move all the children
        // up the tree, then remove the span itself from the tree.
        $parent = $span->parentNode;
        foreach ($span->childNodes as $child)
        {
          $parent->insertBefore($child->cloneNode(), $span);
        }
        $parent->removeChild($span);

        // decrement the iterator so we don't end up skipping over nodes
        $s--;
      }
    }
  }
Brad Christie
  • 100,477
  • 16
  • 156
  • 200