2

We have HTML code that looks like:

<h1><a name="_Toc22332223">Creating a record</a><h1>
<h1><a name="sectionB">Creating a record</a><h1>

Is there expression to use that we can find and delete the <a name=> and leave the text like this: <h1>Creating a record<h1>

We also do not way to remove other hyperlinks like <a href>

I tried <a name="[0-9]*">.+</a> to no avail.

Thanks!

user635800
  • 193
  • 2
  • 9
  • 1
    You would probably save yourself a ton of headaches from fringe cases (something has 2 spaces, has both a href and name, no closing ``...etc) by just learning to use a DOM parser. – Jonathan Kuhn Apr 09 '14 at 21:22
  • 2
    Why not use [**DOMDocument**](http://www.php.net/manual/en/domdocument.loadhtml.php) – Rahil Wazir Apr 09 '14 at 21:23
  • 1
    I laugh just by thinking of [this post](http://stackoverflow.com/a/1732454/1437016) whenever I see regex and html in the same sentence (no offense) – Maen Apr 09 '14 at 21:26
  • What result you are expecting? Does the original string get replaced? Or what? – Rahil Wazir Apr 09 '14 at 21:35

1 Answers1

1

As suggested by others DOM parsing is the most reliable way.

But if it has to be very simple you can use the the following regex

<[aA]\s+name\s*=[^>]*>(.*)[^<]<\/a>

Example on http://rubular.com/r/cI2CTwUCy3

vaichidrewar
  • 9,251
  • 18
  • 72
  • 86