0

If I have a bunch of urls like this:

<li><a href="http://www.xyz.com/sometext/someothertext/123/sometext/">Xyz 123</a></li>  
<li><a href="http://www.xyz.com/345/sometext/someothertext/">Xyz 345</a></li>

What would a regex look like to erase the urls inside the hrefs so that they become:

<li><a href="">Xyz 123</a></li> 
<li><a href="">Xyz 345</a></li>

4 Answers4

2

The following should do what you like:

/href=\"([^\"]*)\"/

Basically match href="<any text but a '"'>".

jjnguy
  • 136,852
  • 53
  • 295
  • 323
2

Search for <a href="[^"]*" and replace with <a href="".

If you add more details about which language you're using, I can be more specific. Be aware also that regular expressions are usually not the tool of choice when dealing with HTML.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • *regular expressions are usually not the tool of choice when dealing with HTML* True in general, but this subset (an href attribute) is regular so regex is fine for this particular case. (+1 anyway, but just wanted to make it clear) – Platinum Azure Dec 27 '10 at 18:30
2

First of all, do not use regex to parse HTML — why? Have a look here or here.

Process the HTML using an XML reader / XML document processing engine. Then use XPath to find nodes matching your criteria and alter href attributes in the DOM.

Note: For HTML which is not well-formed XML a more-general HTML (SGML) parser is required.

Community
  • 1
  • 1
Ondrej Tucny
  • 27,626
  • 6
  • 70
  • 90
0

I partially agree with the others but a more complete version would be

/(<a[^>]+href\s*=\s*\")(.*?)("[^>]*>)/$1$3/gi
Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588