2

I have the following Java code that is supposed to extract a url from a String object

public static void main() {
    String text = "Link to https://some.domain.com/subfolder?sometext is     available";
    String regex = "https://some\\.domain\\.com/subfolder[^ ]*";
    Pattern urlPattern = Pattern.compile(regex);

    Matcher m = urlPattern.matcher(text);

    String url = m.group();

    System.out.println(url);

    return;
}

However, there is no match and the code fails with IllegalStateException.

What is wrong with the RegEx?

  • 1
    String regex = "https:\/\/some\\.domain\\.com/subfolder[^ ]*"; – lordkain Aug 29 '16 at 11:24
  • @lordkain Why do you want to escape the slashes? – J Fabian Meier Aug 29 '16 at 11:25
  • @lordkain that is an illegal escape sequence. Besides, I also tried simply `https.*` which also fails. –  Aug 29 '16 at 11:27
  • I don't think you can use 'group' without calling find or matches methods first. – Ashwinee K Jha Aug 29 '16 at 11:28
  • if(m.matches()){ String url = m.group(); System.out.println(url); } IllegalStateException - If no match has yet been attempted, or if the previous match operation failed – Sanka Aug 29 '16 at 11:31
  • @Sanka no; `.matches()` is a misnomer, see my answer – fge Aug 29 '16 at 11:36
  • https://regex101.com/ says your regex should be: https:\/\/some\.domain\.com\/subfolder[^ ]* This is without escaping for java, so all backslashes should be escaped by a backslash as well... So that makes it `"https:\\/\\/some\\.domain\\.com\\/subfolder[^ ]*"` – Koos Gadellaa Aug 29 '16 at 11:37
  • Another dupe source - http://stackoverflow.com/questions/18257561/simple-java-regex-throwing-illegalstateexception – Wiktor Stribiżew Aug 29 '16 at 11:37

3 Answers3

7

You can't ask a Matcher to give a .group() unless you have called a method which asks the Matcher to operate on the input: one of .find() (preferred), .lookingAt() or .matches().

This is why you get an IllegalStateException.

As to the differences between the three, while the javadoc tells it all, just a quick reminder:

  • .find() does "real" regex matching: it will try and match the regex anywhere in the input text;
  • .lookingAt() adds the constraint that the pattern should match at the beginning of the input text;
  • .matches() is a misnomer since in addition to the constraint imposed by .lookingAt(), it also required that the full input text (the "entire region" in the javadoc) is matched.

Please also recall that those three methods return a boolean depending on whether the match was successful; if the result is false, you can't .group().

fge
  • 119,121
  • 33
  • 254
  • 329
3

You forgot to call m.find() or m.matches(). This is mandatory, otherwise group() does not work.

The find() should return true if the pattern is matched. Only in this case group() will return what you are expecting.

So, modify your code as following:

....
if (!m.find()) {
    return;
}
String url = m.group();
...

EDIT Concerning to what method to call: find() or matches(). find() looks for the pattern in part of string, matches() matches full string. They relate like contains() and equals() of strings.

I personally prefer to use find() because in this case the regex fully defines the behavior. If I want to match full string I use ^ and $.

AlexR
  • 114,158
  • 16
  • 130
  • 208
2

Since m.group()

Returns the input subsequence matched by the previous match.

you have to call m.matches() or m.find() before using m.group().

piet.t
  • 11,718
  • 21
  • 43
  • 52