7

I don't understand why with this regex the method returns false;

Pattern.matches("\\bi", "an is");

the character i is at a word boundary!

xdevel2000
  • 20,780
  • 41
  • 129
  • 196

3 Answers3

15

In Java, matches attempts to match a pattern against the entire string.

This is true for String.matches, Pattern.matches and Matcher.matches.

If you want to check if there's a match somewhere in a string, you can use .*\bi.*. In this case, as a Java string literal, it's ".*\\bi.*".

java.util.regex.Matcher API links


What .* means

As used here, the dot . is a regex metacharacter that means (almost) any character. * is a regex metacharacter that means "zero-or-more repetition of". So for example something like A.*B matches A, followed by zero-or-more of "any" character, followed by B (see on rubular.com).

References

Related questions

Note that both the . and * (as well as other metacharacters) may lose their special meaning depending on where they appear. [.*] is a character class that matches either a literal period . or a literal asterisk *. Preceded by a backslash also escapes metacharacters, so a\.b matches "a.b".


Related problems

Java does not have regex-based endsWith, startsWith, and contains. You can still use matches to accomplish the same things as follows:

  • matches(".*pattern.*") - does it contain a match of the pattern anywhere?
  • matches("pattern.*") - does it start with a match of the pattern?
  • matches(".*pattern") - does it end with a match of the pattern?

String API quick cheat sheet

Here's a quick cheat sheet that lists which methods are regex-based and which aren't:

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • where do you find the syntax of .* ??? Thanks – xdevel2000 Jul 08 '10 at 09:42
  • You mentioned pretty much every string or regex method except [find](http://download.oracle.com/docs/cd/E17409_01/javase/7/docs/api/java/util/regex/Matcher.html#find%28%29). :) – Matthew Flaschen Jul 08 '10 at 10:15
  • @Matthew: yeah I specifically only list the ones in `java.lang.String`. I mean, I can write an essay if I really want to cover everything (e.g. compiling). I'm not sure if I really should, though. – polygenelubricants Jul 08 '10 at 10:21
  • Better use `contains(CharSequence s) ` with Strings that contain symbols such as `%`. I ran into this problem and `matches(".*pattern.*")` returned false when comparing `aaa (%)` with the exact same String, although `equals(String s)` and `contains(CharSequence s) ` returned true. – FunnyJava Apr 03 '17 at 09:38
5

The whole string has to match if you use matches:

Pattern.matches(".*\\bi.*", "an is")

This allows 0 or more characters before and after. Or:

boolean anywhere = Pattern.compile("\\bi").matcher("an is").find();

will tell you if any substring matches (true in this case). As a note, compiling regexes then keeping them around can improve performance.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
2

I don't understand why Java decided to go in the opposite direction from languages like Perl that has supported regex natively for years. I threw the standard Java regex away and started using my own perl-style regex lib for Java called MentaRegex. See below how regex can make sense in Java.

The method matches returns a boolean saying whether we have a regex match or not.

matches("Sergio Oliveira Jr.", "/oliveira/i" ) => true

The method match returns an array with the groups matched. So it not only tells you whether you have a match or not but it also returns the groups matched in case you have a match.

match("aa11bb22", "/(\\d+)/g" ) => ["11", "22"]

The method sub allows you perform substitutions with regex.

sub("aa11bb22", "s/\\d+/00/g" ) => "aa00bb00"

Support global and case-insensitive regex.

match("aa11bb22", "/(\\d+)/" ) => ["11"]
match("aa11bb22", "/(\\d+)/g" ) => ["11", "22"]
matches("Sergio Oliveira Jr.", "/oliveira/" ) => false
matches("Sergio Oliveira Jr.", "/oliveira/i" ) => true

Allows you to change the escape character in case you don't like to see so many '\'.

match("aa11bb22", "/(\\d+)/g" ) => ["11", "22"]
match("aa11bb22", "/(#d+)/g", '#' ) => ["11", "22"]
TraderJoeChicago
  • 6,205
  • 8
  • 50
  • 54