-1

i want use php regex remove all attribute in html like: title="..." id="..." class="..." excel href I use $result = preg_replace('#[^(href)]="(.*?)"#is', '', $result); but it wrong Test online http://www.phpliveregex.com/p/dcn

cuongbn
  • 1
  • 2

1 Answers1

0

You really should consider using an SGML parser for this kind of work. Regular expressions are not well-suited to HTML processing. However, if they are the only thing available to you, you need to learn more about the syntax. At least one of your problems is with the sub-expression [^(href)], which refers to a character class. This matches a single character that is not among (, h, r, e, f, and ). This is probably not what you intended.

You could try using a negative look-ahead with back-references, but you might end up chewing through stuff you didn't intend to, or missing stuff you wanted. Consider the following HTML-ish snippet:

<p class="...">Properties like <a class="..." href="..."
name="...">href="..."</a> and <a href="..."
name="...">name="..."</a> should come after the &lt;a
and before the &gt;.</p>

<p class="..."><a name="..." href="..."><img
src="..." /></a><br class="..." />Fig. 1</p>

You need to be able to tell when you've entered a tag (hence my recommendation about using an SGML parser), and it's not obvious how to ensure you're replacing the correct strings using just negative look-aheads.

preg_replace_callback may be better suited to your use case (i.e., use your $callback to preserve your href attributes, but filter everything else):

$filtered = preg_replace_callback('#<([^/\s]\S*)((?:\s+[^>=]+=(?:\'[^\']*\'|"[^"]*"))*)(\s*/?)>#is',
    function ($matches) {
        $filtered = preg_replace_callback('#\s+([^=]+)=(?:\'[^\']*\'|"[^"]*")#is',
            function ($matches) {
                return ($matches[1] != 'href'
                    ? ''
                    : $matches[0]);
            }, $matches[2]);

        return ('<' . $matches[1] . $filtered . $matches[3] . '>');
    }, $subject);

There's probably a simpler way than the above to achieve the same thing, but you should be able to get the idea. By the way, running the above HTML-ish snippet through the above code gives you:

<p>Properties like <a href="...">href="..."</a> and <a href="...">name="..."</a> should come after the &lt;a
and before the &gt;.</p>

<p><a href="..."><img /></a><br />Fig. 1</p>%

One or more of these tutorials might be of some help, depending on your learning style:

Community
  • 1
  • 1
posita
  • 864
  • 1
  • 9
  • 17