5

Can some one help me create a regular expression in C#.net to add target="_blank" to all <a> tag links in my content?

If the link already has a target set then replace it with "_blank". The purpose is to open all links in my content in a new window.

Appreciate your help

-dotnet rocks

Oleks
  • 31,955
  • 11
  • 77
  • 132
dotnetrocks
  • 113
  • 1
  • 5
  • 1
    dotnetrocks, but html parsing with regex don't http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Amarghosh May 11 '10 at 04:43
  • Looking at the [specification for XML](http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-starttags), this task might be possible for valid XHTML input. But it will look ugly. – Christian Semrau May 14 '10 at 21:30
  • A simple regex would probably match inside comments and CDATA areas, which might or might not be a problem for you. These complications are the reason why here on Stack Overflow one usually gets a certain link as an answer for "parse HTML with regex" questions. – Christian Semrau May 14 '10 at 21:34

4 Answers4

10

There are a lot of mentions regarding not to use regex when parsing HTML, so you could use Html Agility Pack for this:

HtmlDocument document = new HtmlDocument();
document.LoadHtml(yourHtml);

var links = document.DocumentNode.SelectNodes("//a");
foreach (HtmlNode link in links)
{
    if (link.Attributes["target"] != null)
    {
        link.Attributes["target"].Value = "_blank";
    }
    else
    {
        link.Attributes.Add("target", "_blank");
    }
}

this will add(or replace if necessary) target='_blank' to all the anchors in your document.

carla
  • 1,970
  • 1
  • 31
  • 44
Oleks
  • 31,955
  • 11
  • 77
  • 132
7
RegEx.Replace(inputString, "<(a)([^>]+)>", "<$1 target=""_blank""$2>")

It will add target also in those anchor tags which already have target present

Edwin de Koning
  • 14,209
  • 7
  • 56
  • 74
Avinash Nigam
  • 79
  • 1
  • 1
2

I did this with an extension method similar to the approach Alex showed. The method:

// Return the input string with all parsed HTML links having the "target" attribute set to specified value
// Links without a target attribute will have the attribute added, existing attributes values are updated
public static string SetHtmlLinkTargetAttribute(this string inputHtmlString, string target)
{
    var htmlContent = new HtmlDocument();
    htmlContent.LoadHtml(inputHtmlString);

    // Parse HTML content for links
    var links = htmlContent.DocumentNode.SelectNodes("//a");
    foreach (var link in links)
    {
        link.SetAttributeValue("target", target);
    }

    return htmlContent.DocumentNode.OuterHtml;
}

And using it to clean up my links:

// Enforce targets for links as "_blank" to open in new window
asset.Description = asset.Description.SetHtmlLinkTargetAttribute("_blank");
Noah Stahl
  • 6,905
  • 5
  • 25
  • 36
0
RegEx.Replace(inputString, "<(a)([^>]+)>", "<$1 target=""_blank""$2>")
Brandon Montgomery
  • 6,924
  • 3
  • 48
  • 71
  • Good quick answer, but you don't account for target="_blank" or target="something else" already being in the regex. Something to think about. – Platinum Azure May 21 '10 at 18:41