1

I am very new to regex strings and operation. But I am trying to develop an android app that needs to replace text url (without tag) from the whole string to

<a href='$link'>$link </a> 

I found that working code -

text_to_url= text_to_url.replaceAll("(<a[^>]+>)|(http(?s)://.*)", "<a href=\"$0\">$0</a>");

But as I admitted as above, I am very new to regex words and functions. Even I can get url inside tag with that code, but it not stop at end of url (I think according to *).

Problem is, if there are 2 or more continuous link_text_urls side by side or line by line, it displaying as one link (url is 1st occurence url) .

I tried many times and searched through googles to find this bit result. But my regex knowledge can't help me to find it out.

Please kindly let me know the answer. Thank you so much for understanding my problem.

Example text -

<h3>Post Title</h3>
<p>This is a paragraph of text of the post</p>
<img src="http://imageurl">
<p>Please read more on this link</p><br/>
http://www.readmorelink.com/1212/1212post
Aung Aung Swe
  • 262
  • 2
  • 11

2 Answers2

0

looks like the regex you are using is wrong.

try this:

text_to_url = text_to_url.replaceAll("(?i)\\b((?:[a-z][\\w-]+:(?:\\/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}\\/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))", "<a href=\"$0\">$0</a>");

this regex is not from me, it is actually from john gruber and is well explained here: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

There are various editors where you can try and play around with regexes, like e.g. this one: https://regex101.com/ - they are very handy to understand what's going on.

stamanuel
  • 3,731
  • 1
  • 30
  • 45
0

I can see a minor error in your regex. It should be https? instead of http(?s) to make s optional. (?s) means inline modifier to make . match newline character as well.
As far as

but it not stop at end of url (I think according to *)


Yes you are right, it is because of * which is greedy by default. You can make it lazy by adding a ? after it.
But a better approach would be to use this

text_to_url= text_to_url.replaceAll("(?<!\")(https?://[^\s\n]*)(?!\")", "<a href=\"$0\">$0</a>");

where [^\s\n]* will match any character zero or multiple times which is not a space or a newline.

ManzoorWani
  • 1,016
  • 7
  • 14