0

I basically want to check if a string is formatted exactly as: "WORD1 WORD2 WORD3", where WORD1, WORD2, and WORD3 are any arbitray words. In short, I'm trying to check if a string contains exactly two whitespaces and exactly three words; no numbers and no symbols other than regular letters.

I've looked extensively at other posts regarding regex in Java but none of those posts seem to say how to match exactly n whitespaces. Similar posts are this, this, and but they only seem to explain how to find strings that only contain whitespaces or if they contain any whitespaces.

I looked at the Pattern class Java documentation on how to match spaces and it says that this: [ \t\n\x0B\f\r] matches "a whitespace character", which I believe includes the space, tab, newline, , form-feed, and carriage return characters.

But when I implement the code in Java, I don't get what I expect:

import java.util.regex.Pattern;

public class WhiteSpace{
    public static void main(String[] args) {
        boolean b = Pattern.matches("[ \\t\\n\\x0B\\f\\r]", "word word word"); 
        System.out.println(b); // This prints false instead of true even though there are 2 spaces in the string.
    }
}

Even trying just "[ ]" or "\\s" doesn't seem to work. I don't have any luck with quantifiers either, such as x{2}? (to match x exactly twice). And the baffling thing is that when I try out the same thing on a regex tester website (such as regex101.com), I do indeed get the 2 matches that I want.

Some feedback would be appreciated!

  • 1
    `Pattern.matches()` tries to match the _entire input string_; you'd need to do `Pattern.compile("[ \\t\\n\\x0B\\f\\r]").matcher("word word word").find()` to find a match within the string. – Kevin Anderson Jul 01 '20 at 03:10

2 Answers2

3

I would use String#matches here, with the following regex pattern:

\S+\s\S+\s\S+

Sample script:

String input = "WORD1 WORD2\tWORD3";
if (input.matches("\\S+\\s\\S+\\s\\S+")) {
    System.out.println("MATCH");
}

The above pattern should work for 3 words with exactly two whitespace characters, because there is no other way to arrange the 3 words to achieve this requirement.

Edit:

If you want to only admit "regular" letters in the three words, then use:

(?i)[A-Z]+\s[A-Z]+\s[A-Z]+
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • ...of course it also matches `"%\n%\n%"`, which at first glance looks looks like one bad word. – Cary Swoveland Jul 01 '20 at 03:56
  • @CarySwoveland The original requirements were worded "whitespace" characters. Of course, if we want to allow for only space, we can use `\S+ \S+ \S+`. – Tim Biegeleisen Jul 01 '20 at 03:57
  • Incorrect answer. Question said *"no numbers and no symbols other than regular letters"*, so using `\S` is incorrect, since it matches both numbers and symbols. The correct pattern would be `\p{L}`, or `\p{Alpha}`, or `[a-zA-Z]`. If using `\p{L}`, then `\s` should likely be replace with `\p{Zs}`, and if using `\p{Alpha}`, then `\s` should likely be replace with `\p{Space}`. – Andreas Jul 01 '20 at 04:01
  • Tim, don't feel bad about misreading the question. I rarely get them right myself. After reading a bit my mind invariably starts twisting the words to make the question more interesting. – Cary Swoveland Jul 01 '20 at 04:12
  • @CarySwoveland I know where the misread happened. I read the title of the question, and the first two sentences, and duped myself into thinking that I had all requirements `:-)` – Tim Biegeleisen Jul 01 '20 at 04:13
0

Split the string and test each part.

var count = 0;
for (var s : input.split(" ")) {
  if (s.matches("[a-zA-Z]+")) {
    count++;
  } else {
    return false;
  }
}
return count == 3;

It does work:

https://repl.it/repls/TruthfulLuxuriousOmnipage#Main.java

OscarRyz
  • 196,001
  • 113
  • 385
  • 569