0

I am essentially writing "grep" in Java and don't think I understand the conventions. Assume the following code block:

public class HelloWorld {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("this");
        Matcher m = p.matcher("this file has one line");
        System.out.println(m.matches());
    }
}

The above code prints out "false"

To my understanding, the pattern "this" should be found in the string "this file has one line"

Is it an error in my syntax or in my understanding of the conventions of Pattern or Matcher?

EDIT: Given the code:

Matcher m = MY_PATTERN.matcher("FOO[BAR]");
    while (m.find()) {
    String s = m.group(1);
    // s now contains "BAR"
}

How could you retrieve the whole line that contains the denoted pattern?

daviscodesbugs
  • 657
  • 7
  • 17
  • 4
    Your code isn't looking for `"this"` inside of a String, but rather it's looking to see if the String completely matches the `"this"` regex String, and it doesn't, and so `false` is correctly returned. – Hovercraft Full Of Eels Sep 19 '15 at 19:32
  • 1
    You may want to learn about `contains` in the String class http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#contains(java.lang.CharSequence) – Michael Sep 19 '15 at 19:33
  • 1
    Try using `find` instead of `matches`. – Andy Turner Sep 19 '15 at 19:34
  • 1
    If you must use regular expressions for this see http://stackoverflow.com/a/600740/1413133 – Manos Nikolaidis Sep 19 '15 at 19:35
  • possible duplicate of [Difference between matches() and find() in Java Regex](http://stackoverflow.com/questions/4450045/difference-between-matches-and-find-in-java-regex) – RealSkeptic Sep 19 '15 at 19:36
  • 2
    Even `grep` doesn't care about "words"; `echo thisissomething|grep this` will print `thisissomething`. With that misunderstanding out of the way, what you want is to use .find(), not .matches(). And yes, .matches() is woefully misnamed and catches everybody the first time (.matches() does not do regex matching as regex matching is defined). – fge Sep 19 '15 at 19:37

4 Answers4

4

Matcher.matches() returns true if and only if the whole string matches the given regex.

If you are looking for a partial match, you have the choice:

  • Modify your regex: Pattern.compile(".*this.*"), or
  • Use Matcher.find(): System.out.println(m.find());, or
  • Use String.contains() if you don't need the power of regex but are rather looking for a literal partial match.
Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103
  • 1
    The problem is that there is no "partial match" with regexes; a regex matches or it doesn't. And it can match anywhere within its input, by definition Which is why the choice of the name `.matches()` is so poor. – fge Sep 19 '15 at 19:41
  • @fge, I do agree that Java's `Matcher.matches()` (and thus `String.matches()`) behavior is counter-intuitive, compared to other regex implementations. Yet this is how it was done in Java... – Alex Shesterov Sep 19 '15 at 19:46
  • Yeah, I am fully aware of it and I am still angry at the poor naming choices after all these years :/ Meh. – fge Sep 19 '15 at 19:52
3

Thank you for all the comments!

My solution is as follows:

public static void main(String[] args) throws FileNotFoundException {
    String s = "this is line one\n"
            + "this is line two\n"
            + "This is line three";

    Pattern p = Pattern.compile("this");

    Scanner scanner = new Scanner(s);
    while (scanner.hasNextLine()) {
        String line = scanner.nextLine();
        Matcher m = p.matcher(line);
        if (m.find()) {
            System.out.println(line);
        }
    }
}

The above code prints out lines 1 and 2 because they both contain "this" but not line 3 because "This" is capitalized.

If we change the regex to "one," it only prints out line 1.

daviscodesbugs
  • 657
  • 7
  • 17
1

Your code isn't checking if your string contains "this", it's checking if your string equals "this". So the return of m.matches() is correct.

You don't need a pattern matcher for the functionality you're looking for. String.contains() does what you want.

If you really need to use regular expressions, then this is probably what you should write:

public class HelloWorld {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("this");
        Matcher m = p.matcher("this file has one line");
        System.out.println(m.find()); // will print true
    }
}
PC Luddite
  • 5,883
  • 6
  • 23
  • 39
1

If you truly want to replicate grep in Java, you have quite a task ahead of you; for instance, consider the following invocations:

grep something somefile
echo sometext | grep something
grep -x something somefile
grep -Fx something somefile

A lot of options there.

Now, let us tackle the fundamentals.

The first problem is that the .matches() method, of either String or Matcher, is a misnomer. By definition, a regex matches an input if it can recognize any text within the input... Unfortunately this is not how the Java API defines the .matches() method. Java's .matches() returns true if and only if the whole input matches the regular expression.

As such, this is not what you should use. Since you use a Matcher, you want to use .find() instead.

The second problem is the line by line matching; regexes in general don't care about lines; it just happens that some characters are present which define this notion. Again, regexes do not care.

Regexes do not care, however this notion is important enough that Java has classes allowing to separate a character stream into "lines", by splitting against the necessary characters... So here is how you would replicate grep by reading from stdin, assuming the default encoding (code written for Java 8+):

public static void main(final String... args)
{
    if (args.length == 0)
        throw new IllegalArgumentException("missing pattern as an argument");

    final Pattern pattern = Pattern.compile(args[0]);

    final Charset cs = Charset.defaultCharset();
    final CharsetDecoder decoder = cs.newDecoder()
        .onMalformedInput(CodingErrorAction.REPORT);

    try (
        final Reader r = new InputStreamReader(System.in, decoder);
        final BufferedReader reader = new BufferedReader(r);
    ) {
        String line;
        while ((line = reader.readLine()) != null)
            if (pattern.matcher(line).find())
                System.out.println(line);
    }
}
fge
  • 119,121
  • 33
  • 254
  • 329