Regex to delete
Asked Apr 09 '14 at 21:17

Active Apr 09 '14 at 21:54

Viewed 73 times

Question

We have HTML code that looks like:

<h1><a name="_Toc22332223">Creating a record</a><h1>
<h1><a name="sectionB">Creating a record</a><h1>

Is there expression to use that we can find and delete the <a name=> and leave the text like this: <h1>Creating a record<h1>

We also do not way to remove other hyperlinks like <a href>

I tried <a name="[0-9]*">.+</a> to no avail.

Thanks!

You would probably save yourself a ton of headaches from fringe cases (something has 2 spaces, has both a href and name, no closing ``...etc) by just learning to use a DOM parser. — Jonathan Kuhn, Apr 09 '14 at 21:22
Why not use [**DOMDocument**](http://www.php.net/manual/en/domdocument.loadhtml.php) — Rahil Wazir, Apr 09 '14 at 21:23
I laugh just by thinking of [this post](http://stackoverflow.com/a/1732454/1437016) whenever I see regex and html in the same sentence (no offense) — Maen, Apr 09 '14 at 21:26
What result you are expecting? Does the original string get replaced? Or what? — Rahil Wazir, Apr 09 '14 at 21:35

score 1 · Answer 1 · answered Apr 09 '14 at 21:54

1

As suggested by others DOM parsing is the most reliable way.

But if it has to be very simple you can use the the following regex

<[aA]\s+name\s*=[^>]*>(.*)[^<]<\/a>

answered Apr 09 '14 at 21:54

vaichidrewar

1 Answers1