I've found that page: https://mathiasbynens.be/demo/url-regex where different regular expressions for URL validation and their possibilities are nicely listed. Diego Perini's regex is the most powerful one and I would like to use it in Java. However it doesn't work if I use it that way:
public class URLValidation {
// "\" replaced by "\\"
private static Pattern REGEX = Pattern.compile("_^(?:(?:https?|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?!10(?:\\.\\d{1,3}){3})(?!127(?:\\.\\d{1,3}){3})(?!169\\.254(?:\\.\\d{1,3}){2})(?!192\\.168(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)*(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}]{2,})))(?::\\d{2,5})?(?:/[^\\s]*)?$_iuS");
private static String[] URLs = new String[] { "http://foo.com/blah_blah", "http://foo.com/blah_blah/", "http://foo.com/blah_blah_(wikipedia)", "http://foo.bar?q=Spaces should be encoded" };
public static void main(String[] args) throws Exception {
for (String url : URLs) {
Matcher matcher = REGEX.matcher(url);
if (matcher.find()) {
System.out.println(matcher.group());
}}}}
This code outputs nothing, however it should output the first three URLs in the array. How to compile the regex properly to get the code working?
upd: Thanks for the proposals. I tested your regexes in the real application. What I do there is iterate through log files and look for URL in each line. A log files have timestamps and usernames enclosed in [] and <> respectively and sometimes can contain special insivible characters responsible for formatting (color, boldness, etc) like \u0003
. The regex seems to have problem with that type of strings: http://ideone.com/WEcgBY
upd2: And how about a regex finding all URLs in a line if it contains several? For example to use it like this:
String[] urlsFromLine = REGEX.split(line);
for (String url : urlsFromLine) {
System.out.println(url);
}