5

I'm feeling pretty silly having to ask this, but I cannot get this to work to save my life...

What Works

preg_replace( '/(<[^>]+) onmouseout=".*?"/i', '$1', preg_replace( '/(<[^>]+) onmouseover=".*?"/i', '$1', $strHtml ) )

How can I combine these two preg_replace functions into one (by combing the two regex patterns?

My Attempt to Cleanup (Doesn't Work)

preg_replace( '/(<[^>]+) (onmouseover|onmouseout)=".*?"/i', '$1', $strHtml )

I want this preg_replace() function to remove all onmouseover AND onmouseout attributes from my HTML string. It appears to remove only one of the two attributes... What am I doing wrong?

UPDATE: Example String

<p><img src="http://www.bestlinknetware.com/products/204233spc.jpg" width="680" height="365"><br>   <a href="http://www.bestlinknetware.com/products/204233INST.pdf" target="_blank" onmouseover="MM_swapImage('Image2','','/Content/bimages/ins2.gif',1)" onmouseout="MM_swapImgRestore()"><img name="Image2" border="0" src="http://www.bestlinknetware.com/Content/bimages/ins1.gif"></a> </p> <p><strong>No contract / No subscription / No monthy fee<br> 1080p HDTV reception<br> 32db high gain reception<br> Rotor let you change direction of the antenna to find best reception</strong></p>  <a href=http://transition.fcc.gov/mb/engineering/dtvmaps/  target="blank"><strong>CLICK HERE</strong></a><br>to see HDTV channels available in your area.<br> <br/> ** TV signal reception is immensely affected by the conditions such as antenna height, terrain, distance from broadcasting transmission antenna and output power of transmitter. Channels you can watch may vary depending on these conditions. <br> <br/> <br/> <p>* Reception: VHF/UHF/FM<br/>   * Reception range: 120miles<br/>   * Built-in 360 degree motor rotor<br>   * Wireless remote controller for rotor (included)<br/>   * Dual TV Outputs<br>   * Easy Installation<br>   * High Sensitivity Reception<br>   * Built-in Super Low Noise Amplifier<br>   * Power : AC15V 300mA<br> <br/> Kit contents<br/> * One - HDTV Yagi antenna with built-in roter & amplifier<br/> * One - Roter control box<br/> * One - Remote for roter control box<br/> * One - 40Ft coax cable<br/> * One - 4Ft coax cable<br/> * One - power supply for roter control box</p>

UPDATE: Tool for Future Views of This Thread

https://regex101.com/

I could never figure out exactly how to use http://regexr.com/, so I tried this regex101.com site, and I have been loving it ever since. Highly recommended for anyone facing similar issues (that used a cut-and-paste regex pattern like I did originally...).

Derek Foulk
  • 1,892
  • 1
  • 19
  • 37
  • 3
    *Sigh.* http://stackoverflow.com/a/1732454/2812842 – scrowler Jan 12 '16 at 01:37
  • 1
    the first `+` is greedy and will match everything before the last `onmouseover` or `onmouseout`. So if they appear on the same line the first one will not be replaced – jack3694078 Jan 12 '16 at 01:37
  • So not to be super helpless, but how can I rework this pattern so that I can achieve the desired result? I tried removing the plus, which didn't work? (Regex is not my thing :() – Derek Foulk Jan 12 '16 at 01:46

1 Answers1

1

The problem with your original expression was that the initial group was grabbing too much and so the only one of the two being replaced was the one appearing last on the line. That happened because of the greedy [^>]+ repetition that ate up a larger portion of the search string than you were anticipating, capturing everything from the beginning of the first desired match through to the start second attribute you wanted to get rid of. And then having the pattern anchor to the starting bracket of an html tag would also prevent multiple matches within the element even after addressing that issue.

If you want to do this in one call to preg_replace() then rather than trying to grab the text that you want to keep it makes more sense to look for text to remove (by substitution with an empty string):

preg_replace( '/(onmouseover|onmouseout)=".*?"/i', '', $strHtml )

You already had a non-greedy match on the attribute value (with the .*?) and based on your prior code it appears to have been working well for you already. Note that this particular expression doesn't cover all the possible variations in an HTML/XML document (whitespace and quote marks, for example.) I trust that you can make a judgment call regarding whether this is generic enough for your needs.

shawnt00
  • 16,443
  • 3
  • 17
  • 22
  • OP accepted the answer so what's with the downvote? Because this is being applied to HTML? – shawnt00 Jan 12 '16 at 06:36
  • Your solution works, but *why*? What was wrong with the OP's regex? Without that explanation your answer is incomplete. – Alan Moore Jan 12 '16 at 08:50
  • That's fine and I'll add some. But much of that fell out in the early comments and OP readily accepted as soon as the answer was posted. – shawnt00 Jan 12 '16 at 08:52
  • Better, but the problem isn't really the greediness of the `[^>]+` part (`[^>]+?` wouldn't work either), it's the `<` anchoring the match to the beginning of the tag. – Alan Moore Jan 12 '16 at 09:37
  • 1
    Yes, you're right about that too. OP added the sample after I had answered so at that point we didn't even know this was all happening within a single html element. – shawnt00 Jan 12 '16 at 09:45
  • @shawnt00's answer is much cleaner than my original pattern. I was in a rush to get my project done, and I actually appreciated the quick response (even without the lengthy explanation). There are a couple different types of users: users that are learning the craft (that benefit from complete explanations), and then users who are grateful for quick assistance with odds and ends like this. I am the later in most cases. Not saying answers without explanation is a great thing, but down-voting a correct answer seems a bit harsh... BUT everyone has a right to an opinion I suppose. Thanks @shawnt00! – Derek Foulk Jan 14 '16 at 00:34