0

Please help to rewrite all links without "target" attribute.

For example, the text is:

<a href="google.com" onclick="alert('Hello!!')">My Link 1</a>
<a href="my.com" class="some-class">My Link 2</a>
<a href="dot.net" target="_parent" class="some-class">My Link 3</a>
<a href="find.me" class="some-class">My Link 4</a>

The text is needed to got:

<a href="google.com" onclick="alert('Hello!!')" target="_blank">My Link 1</a>
<a href="my.com" class="some-class" target="_blank">My Link 2</a>
<a href="dot.net" target="_parent" class="some-class">My Link 3</a>
<a href="find.me" class="some-class" target="_blank">My Link 4</a>

3rd link is untouched, other links have attribute "target" now.

Please help to compose Regular Expression correct. I tried this:

Regex.Replace(text, "<(a)([^>]+)(((?! target=).)*$)([^>]+)>", "<$1 target=\"_parent\" $2 $3>");

but it's not working.

"Html Agility Pack" is undesirable.

user1820034
  • 169
  • 2
  • 5
  • 11
  • 7
    Don't [parse html with Regex](http://stackoverflow.com/a/1732454/1895201) – DGibbs Mar 04 '13 at 12:29
  • It's impossible to understand what author wanted to tell there. I still want to parse string with Regex. – user1820034 Mar 04 '13 at 12:37
  • Hmm, do you want some other solution that can make it without Regex? – Pawan Nogariya Mar 04 '13 at 12:38
  • @user1820034 Are you familiar with satire? What's wrong with HTML agility pack? It was built for these sort of tasks. Why are you pre-emptively rejecting the solution to your problem? – DGibbs Mar 04 '13 at 12:41
  • It should be small and simple solution. It's not desirable to install any packages or big classes. – user1820034 Mar 04 '13 at 12:41
  • You should use DOM or SAX parser for this. – D3V Mar 04 '13 at 13:14
  • I don't like HTML agility pack either, but I DON't recommend regex on parsing html. I have an app that parses HTML using SGMLReader and converting HTML to XML and there I would use XMLTextReader to parse it. – jomsk1e Mar 04 '13 at 21:49

3 Answers3

4

This should work as desired:

Regex.Replace(text, "<a(((?!target=).)*)\">", "<a$1\" target=\"_parent\">")

A small amount of assumption is required that every opening anchor tag you close must have the " character just before closing the opening tag with the > character.

i.e. <a......">My link</a>

Chirag Bhatia - chirag64
  • 4,430
  • 3
  • 26
  • 35
  • You can drop the small assumption by replacing the match for any character `.` by the match for any character except a closing bracket `>`, like so: `[^>]`. The full regex becomes: `])*)>` – Jules Colle Dec 11 '21 at 16:00
1

Solution for you:

Regex _r = new Regex("<a (.+?)>");
foreach (Match m in _r.Matches(text))
{
    string Link = m.Groups[0].Value;
    if (!Link.Contains("target"))
        text = text.Replace(Link, string.Format("{0} target=\"_parent\">", Link.Substring(0, Link.Length - 1)));
}
Termininja
  • 6,620
  • 12
  • 48
  • 49
  • Thanks, but I've tried `Regex.Replace(text, "<(a)([^>]+)>", "<$1 target=\"_parent\" $2>")` already, but it also grabs all links with "target" attribute. I want to exclude them. – user1820034 Mar 04 '13 at 12:55
  • There are a lot of links with "target" and without "target" attributes in text. So `if(!text.Contains(...))` does not work. – user1820034 Mar 04 '13 at 13:01
  • Thank you for help. That's good, but I don't want to remove "target='something'". In your example, if "target='something'" exists, it's replaced with "target='_parent'", but it should be "target='something'". – user1820034 Mar 04 '13 at 13:11
0

May be something easier like this ? :

if (false == text.Contains("target="))
{
   Regex.Replace(text, "<(a)([^>]+)(((?! target=).)*$)([^>]+)>", "<$1 target=\"_parent\" $2 $3>");
}
Xaruth
  • 4,034
  • 3
  • 19
  • 26