1

I have a title="" attribute in an anchor that contains HTML. I'm trying to remove the title attribute entirely but for whatever reason the preg replace I'm using will not work. I've tried:

$output = preg_replace( '/title=\"(.*?)\"/',  '', $output );
$output = preg_replace( '/\title="(.*?)"/',   '', $output );
$output = preg_replace( '` title="(.+)"`',    '', $output );

None of the above works, but I can use something like:

$output = str_replace( 'title', 'class', $output );

Just to prove that I was able to do something ( and I wasn't uploading the wrong file or something ). Output looks like this:

<a href="#" title="<table border=\&quot;0\&quot; width=\&quot;100%\&quot; cellspacing=\&quot;0\&quot; cellpadding=\&quot;0\&quot;>
    <tbody>
        <tr>
            <td colspan=\&quot;2\&quot; align=\&quot;center\&quot; valign=\&quot;top\&quot;></td>
        </tr>
        <tr>
            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>
            table content
            </td>
            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>
            table content
            </td>
        </tr>
    </tbody>
</table>">Link Title</a>

So what I'm trying to do is filter $output and remove the title attribute entirely including everything inside the title attribute. Why will the preg_replace() above not work and what are my options?

Howdy_McGee
  • 10,422
  • 29
  • 111
  • 186

1 Answers1

2

I would not use a regex to do operations on [x]html, I'd use a html parser instead.

But if you still want to use a regex then you can use a regex like this:

title="[\s\S]*?"

Working demo

enter image description here

You can have this code:

$re = "/title=\"[\\s\\S]*?\"/"; 
$str = "<a href=\"#\" title=\"<table border=\&quot;0\&quot; width=\&quot;100%\&quot; cellspacing=\&quot;0\&quot; cellpadding=\&quot;0\&quot;>\n    <tbody>\n        <tr>\n            <td colspan=\&quot;2\&quot; align=\&quot;center\&quot; valign=\&quot;top\&quot;></td>\n        </tr>\n        <tr>\n            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>\n            table content\n            </td>\n            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>\n            table content\n            </td>\n        </tr>\n    </tbody>\n</table>\">Link Title</a>"; 
$subst = ""; 

$result = preg_replace($re, $subst, $str);

Update: You can see a clear example about why you shouldn't use regex to parse html in Andrei P. comment

Community
  • 1
  • 1
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
  • 1
    Why wouldn't yo͛ͫu ᵒ ̢̯̩̯̲̠u͖̦̫̣s̶̫̲̯̺̰̟̼e͎͓̪ ͍̰͙͚̜̀r͜e͈͔g͏̯̪e̢̠̮̠x ̦̫̝t͚͙͔̫̟̠̹o̴ ̢̮̻̳͍p̡̗a̞̞͕̰͕̺͞ͅr͈̥̼̗͇̝s̖͚̭̳e̷͕̘̜͍̘ ̼͍͝h͜t̸̙ml̷̘͎̬̜̞͉?̱͓̘͎̖ ̧̟̭̱I͢s̸̪̰̗ ̝͕̯͔̙͠t̮͟h̤̩̺̱͠e͉̩̬͉̩͜r̟̠e̡͇̭ ̡̻s̘̠ǫ̪̩̼͖̳̺m͎͈e̥͎t̛̫̖h̛̰̬̹͍̼̬i͉n͎̺̯̼g̠̗̤͍ ̬̥̜̗̭͖̯w̡̦̺̦r̛̪͕͚͚̖͔͇o̮͙̣̙͝n͎g̥͕ ̧w̙̗̟̪͖͈i̶̖͔̣̺͓̘̲t̬̪͍̹̙h͇͓̮ ̷̖t̜̪̙͕h̠̗͍a̡̲̟̯̲ṱ̰?̰͖͚ – Andrei P. Mar 19 '15 at 15:06
  • Maybe it's my computer that's messing up the above comment but same question as @AndreiP. - Why wouldn't you use regex to parse HTML? – Howdy_McGee Mar 19 '15 at 15:08
  • @AndreiP. lol! man, you rock. Thanks for the comment, I'll use exactly that why not – Federico Piazza Mar 19 '15 at 15:09
  • :S I don't get the joke, seriously, why not? – Howdy_McGee Mar 19 '15 at 15:10
  • 1
    @Howdy_McGee [I'**l**l just leav`e` t͗̎̂̀hͥ͊͒͋i̿s̀̏͋͗ ͥ͛̑ͧ̄͆̚her̓͋̄̇ͬͧͧe.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Andrei P. Mar 19 '15 at 15:12
  • 1
    @Howdy_McGee, Andrei posted a retorical comment. As you can see in his comment html can mess up the regex engine if you have a text like hims. – Federico Piazza Mar 19 '15 at 15:13
  • @Fede + AndreiP. - thanks for your help, i'll read that answer. – Howdy_McGee Mar 19 '15 at 15:16
  • @AndreiP. how do you add that format? I'd like learn that to improve my answers related to regex and html – Federico Piazza Mar 19 '15 at 15:37
  • @Fede I use [this site](http://www.eeemo.net/). And [here's the git](https://gist.github.com/Rnhmjoj/5855334) written in Python. And to answer your question, no, I don't know how to specifically add that format using regex/html. It's a bit beyond my pay grade to dabble into that. I'm a newbie. – Andrei P. Mar 19 '15 at 16:33