4

I have following regular expression

(?i)\b((https?:\/\/www\.)|(https?:\/\/)|(www\.))?(localhost).*\b

and following url

http://localhost:8081/saman/ab/cde/fgh/ijkl.jsf?gdi=ff8081abcdef02a011b0af032170001&ci=

It matches when tried with both https://regex101.com/ and http://rubular.com/r/kyiKS9OlsM

But when there is any special character at the end, url does not match

import java.text.Format;
import java.text.MessageFormat;
import java.util.regex.Pattern;


public class JavaApplication1 {

/**
 * @param args the command line arguments
 */
private static final String URL_MATCH_REGEX = "(?i)\\b((https?:\\/\\/www\\.)|(https?:\\/\\/)|(www\\.))?({0}).*\\b";
private static final Format format = new MessageFormat(URL_MATCH_REGEX);

static String regex = "";
static String url = "http://localhost:8081/saman/ab/cde/fgh/ijkl.jsf?gdi=ff8081abcdef02a011b0af032170001&ci=";
public static void main(String[] args) {

    try {
        regex = format.format(new Object[]{replaceDomainToUseInRegex("localhost")});
        System.out.println(regex);
        Pattern pattern = Pattern.compile(regex);
                System.out.println(pattern.matcher( url ).matches());

    } catch (Exception e) {
    }

}

private static String replaceDomainToUseInRegex(String domain) {
    return domain.replace(".", "\\.").replace("/", "\\/").replace("?", "\\?");
}

}

Can anyone help me to figure out the issue here?

hetptis
  • 786
  • 1
  • 12
  • 23

1 Answers1

3

Your problem is that you're using two different kinds of matches. Java's matches() requires the entire string to match the regular expression. regex101.com does not. So it says there's a match if any substring of your input string matches the regex. However, in regex101.com, you can get the same kind of match by putting ^ in the front of the regex and $ at the end; now it requires the entire string to match. And it doesn't match.

(\b matches a "word boundary"; it matches the "zero-width substring" between a non-word character and a word character (in either order), or between a word character and the beginning or end of the string. = is not a word character, thus \b doesn't match the position between = and the end of the string.)

ajb
  • 31,309
  • 3
  • 58
  • 84
  • Thanks for the answer. I think in my case changing the regex as follows will work "(?i)\b((https?:\/\/www\.)|(https?:\/\/)|(www\.))?(localhost).*" – hetptis Aug 03 '16 at 04:58