0

I have an html code like below(just a part of it)

<p>
  <strong>
    <div align="center">
      <a onclick="return hs.expand(this)" href="http://example.com/somesome.png">
        <img title="some-bla-bla-text" src="http://example.com/somesome.png" 
             alt="some-bla-bla-text" />
      </a>
    </div>
  </strong><br />
  <strong>
    <div align="center">...

and want to strip it out as

<p>
  <strong>
    <div align="center">
      <img title="some-bla-bla-text" alt="some-bla-bla-text" />
    </div>
  </strong><br />
  <strong>
    <div align="center">...

How can I remove <a onclick="return hs.expand(this)" href="http://example.com/somesome.png"> and its closing tag </a> part of this string?

A regex to match between <a onclick="return hs.expand(this)"....> and </a> would be very helpful I think

rasputin
  • 380
  • 5
  • 22
  • 1
    possible duplicate of [Best methods to parse HTML with PHP](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php) – hakre Jul 29 '11 at 01:32
  • Related: [Using regex to filter attributes in xpath with php](http://stackoverflow.com/questions/6823032/using-regex-to-filter-attributes-in-xpath-with-php/6823087) – hakre Jul 29 '11 at 01:34

3 Answers3

4

Regex isn't powerful enough to do this very well since HTML is not a regular language. It might work in some cases, but then it will be very fragile code that could break when given a different, perfectly valid HTML input. You should look into DOMDocument. It allows you to parse HTML easily.

Paul
  • 139,544
  • 27
  • 275
  • 264
2

with some testing and tweaking you might be able to get something like the following to work

$html = preg_replace('/\<a[^>]*\>((?!\<\/a\>).)*\<\/a\>/i', '\1', $html);

it basically says, find an open a tag, then find everything up to the next closing a tag

Josh Coady
  • 2,089
  • 2
  • 18
  • 25
1

You can probably do what you want with regexes, but you need to provide more details. Do you want to remove all anchor elements, replacing them with whatever was inside them? Or only those that contain IMG tags? Here's a regex that peels off only those anchor tags whose first attribute is onclick:

$s= preg_replace('~\s*<a\s+onclick="[^"]*"[^>]*>((?:(?!</a>).)*)</a>\s*~is', '$1', $s);

see a demo on ideone.com


EDIT: This regex will match an anchor element with an onclick attribute (not necessarily first).

'~\s*<a[^>]*\s+onclick="[^"]*"[^>]*>((?:(?!</a>).)*)</a>\s*~is'

demo

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • Yes I wanted "a regex that peels off only those anchor tags whose attribute is onclick(not first but must have)" thanks a lot. Can you make some changes to match these? – rasputin Jul 29 '11 at 12:38