2

I am trying to parse links from plain text and I came across this really useful site:

http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without

There is an example of usage of that regex to match urls however I have some trouble getting around it syntactically.

What is the equivalent of this in Java:

$(function() {
  var urlRegEx = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))?)/g;
  $('#target').html($('#source').html().replace(urlRegEx, "<a href='$1'>$1</a>"));
});

Any help or a solution would be most appreaciated.

I am aware of the Pattern and Matcher classes in Java but I do not know what jquery's .html() does in order to implement a solution. Thanks in advance.

user1841702
  • 2,683
  • 6
  • 35
  • 53
  • 1
    Why use Pattern and Matcher if you need to replace? Use `String res = input_str.replaceAll(regex, "$1");`. The regex is the same, just remove the initial and last `/` and double other backslashes (and those backslashes inside `[...]` can all be removed except for `\w`). – Wiktor Stribiżew Dec 15 '16 at 07:43

2 Answers2

3

You do not need to use Pattern and Matcher directly if you need to replace the string matched, use String#replaceAll.

String input_str = "http://www.some.site.com?and=value&s=more\nhttp://10.23.46.134\nemail@me.at.site.com";
String regex = "(([A-Za-z]{3,9}:(?://)?)(?:[-;:&=+$,\\w]+@)?[A-Za-z0-9.-]+|(?:www\\.|[-;:&=+$,\\w]+@)[A-Za-z0-9.-]+)((?:/[+~%/.\\w-]*)?\\??(?:[-+=&;%@.\\w]*)#?(?:[.!/\\\\\\w]*))?";
String res = input_str.replaceAll(regex, "<a href='$0'>$0</a>");
System.out.println(res);
// => 
//  <a href='http://www.some.site.com?and=value&s=more'>http://www.some.site.com?and=value&s=more</a>
//  <a href='http://10.23.46.134'>http://10.23.46.134</a>
//  <a href='email@me.at.site.com'>email@me.at.site.com</a>

The regex is the same, just remove the initial and last / with the g modifier and double other backslashes (and those backslashes inside [...] can all be removed except for \w). The outer capturing group can be removed since you can use a $0 backreference to access the whole match value in the replacement pattern.

See the regex demo and a Java demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks very much, this was really useful. I am also trying to avoid including any attribute tags in the final String. Should this be done using URLSpan ? – user1841702 Dec 15 '16 at 09:14
  • You said you were going to use it on plain text. What kind of attribute tags do you mean? – Wiktor Stribiżew Dec 15 '16 at 09:15
  • I just want highlight the link itself and make it clickable like so:http://jsbin.com/wotinulonu/edit?html,js,output without including the surrounding attribute tags. The demo in the right is what I would ideally like to achieve. – user1841702 Dec 15 '16 at 09:35
  • What is an *attribute tag*? See [this regex demo](http://fiddle.re/x59bqa) to check the result for your demo text (click *Java* green button there). – Wiktor Stribiżew Dec 15 '16 at 09:40
0

You can do something like this(adjust the regex to suit your needs):

String originalString = "Please go to http://www.stackoverflow.com";
String newString = originalString.replaceAll("http://.+?(com|net|org)/{0,1}", "<a href=\"$0\">$0</a>");
Rajesh Panchal
  • 1,140
  • 3
  • 20
  • 39