0

I have

String s = "<a href="https://stackoverflow.com">https://stackoverflow.com</a><br/><a href="https://google.com">https://google.com</a>"

Now I just want to replace all links in the href attributes, by prefixing with a fixed value (e.g. `abc.com?'). Here's the result that I want:

String s = "<a href="abc.com?url=https://stackoverflow.com">https://stackoverflow.com</a><br/><a href="abc.com?url=https://google.com">https://google.com</a>"

I tried the following, but it doesn't resolve the problem because it replaces all strings beginning http://, not only those within href attributes:

s= s.replaceAll("http://.+?(com|net|org|vn)/{0,1}","abc.com" + "&url=" + "$0");

What can I do to replace only within the attribute, and not in other content?

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Hoang Nam
  • 101
  • 1
  • 10
  • 1
    Use an HTML parser. And in general, try to write a program that follows your definitions: look for the `href`, not for the `http`. – RealSkeptic Jun 02 '17 at 08:12

2 Answers2

0

You could use a HTML Parser such as JSoup

String s = "<a href="https://stackoverflow.com">https://stackoverflow.com</a>";
Document document = JSoup.parse(s);
Elements anchors = document.getElementsByTag("a");
anchors.get(0).attr("href", "...new href...");

Alternatively if this is too heavy weight a regex should suffice:

<a href="(?<url>[^"]+)">(?<text>[^<]+)<\/a>

Note if you dont care about the text group, replace ?<text> with ?:

Just replace the url & text group using a similar approach to this answer

Eduardo
  • 6,900
  • 17
  • 77
  • 121
0

As said by RealSkeptic look for href instead of the link itself, it saves a lot of effort.

var s = '<a href="http://stackoverflow.com">https://stackoverflow.com</a><br/><a href="https://google.com">https://google.com</a>';

s = s.replace(/href="/g,"href=\"abc.com&url=" );

console.log(s);
Peter-Paul
  • 118
  • 5