3

I need to strip link tags from a body of text but keep the anchor text. for example:

<a href ="">AnchorText</a>

needs to become just:

AnchorText

I was considering using the following RegEx:

<(.{0}|/)(a|A).*?>

Is a RegEx the best way to go about this? If so, is the above RegEx pattern adequate? If RegEx isn't the way to go, what's a better solution? This needs to be done server side.

RandomWebGuy
  • 1,439
  • 11
  • 23
  • 1
    Are you trying to do this at design time? run time? Is this on a page you control or which you are downloading to a client? – Cos Callis Apr 07 '11 at 20:51
  • It needs to happen server side on the fly. There are multiple pages that will be formatted and all exist on the server. Eventually they will be presented as a download. – RandomWebGuy Apr 07 '11 at 20:57
  • Great, looks like you found a good answer. – Cos Callis Apr 07 '11 at 22:00

5 Answers5

5

Your regex will do the job. You can write it a bit simpler as

</?(a|A).*?>

/? means 0 or 1 /

But its equivalent to your (.{0}|/)

stema
  • 90,351
  • 20
  • 107
  • 135
3

You could just use HtmlAgilityPack:

string sampleHtml = "<a href =\"\">AnchorText</a>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sampleHtml);
string text = doc.DocumentNode.InnerText; //output: AnchorText
CodingYourLife
  • 7,172
  • 5
  • 55
  • 69
BrokenGlass
  • 158,293
  • 28
  • 286
  • 335
1

I think a regex is the best way to accomplish this, and your pattern looks like it should work.

John Batdorf
  • 2,502
  • 8
  • 35
  • 43
  • 2
    I heard somewhere that Regex's shouldn't be used for HTML - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 =D – Tejs Apr 07 '11 at 20:50
  • Regex is not good for parsing html into tags, attributes and etc but in this case it is ok. – ludesign Apr 07 '11 at 20:52
1

Use jQuery replaceWith:

$('a').replaceWith(function()
{
    return $('<span/>').text($(this).text());
});

Assuming you are doing this on the client side.

Tejs
  • 40,736
  • 10
  • 68
  • 86
  • I should have specified. I need to do this server side, otherwise I like your JQuery solution. Question edited/updated. – RandomWebGuy Apr 07 '11 at 20:58
1

I have been trying to do the same and found the following solution:

  1. Export the text to CSV.
  2. Open the file in Excel.
  3. Run replace using <*> which will remove links and leave the anchor text.
  4. Import the result again to overwrite existing content.
Michael Schmeißer
  • 3,407
  • 1
  • 19
  • 32
Lee
  • 41
  • 6