In Java regular expression, it has "\B" as a non-word boundary.
https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
If I have a 'char', how can I check it is a non-word boundary?
Thank you.
In Java regular expression, it has "\B" as a non-word boundary.
https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
If I have a 'char', how can I check it is a non-word boundary?
Thank you.
The boundary has a special meaning. It has actually a zero-length match and can therefore not be matched on a single character. It is used to determine the position between a non-word char and a word-char. Also see http://regular-expressions.info/wordboundaries.html.
I however understood that this question is more whether the given char can possibly denote the start or end of a word boundary. From the javadoc which you linked (here is the latest version):
Predefined character classes
.
Any character (may or may not match line terminators)
\d
A digit:[0-9]
\D
A non-digit:[^0-9]
\s
A whitespace character:[ \t\n\x0B\f\r]
\S
A non-whitespace character:[^\s]
\w
A word character:[a-zA-Z_0-9]
\W
A non-word character:[^\w]
So, a word character matches \w
. A non-word character matches \W
. So:
String string = String.valueOf(yourChar);
boolean nonWordCharacter = string.matches("\\W");
The question is very peculiar, but it's true that a \w
on its own is surrounded by \b
. Similarly, a \W
on its own is surrounded by \B
. So for the purpose of word boundary definitions, ^
and $
are non-word characters.
System.out.println("a".matches("^\\b\\w\\b$")); // true
System.out.println("a".matches("^\\b\\w\\B$")); // false
System.out.println("a".matches("^\\B\\w\\b$")); // false
System.out.println("a".matches("^\\B\\w\\B$")); // false
System.out.println("@".matches("^\\b\\W\\b$")); // false
System.out.println("@".matches("^\\b\\W\\B$")); // false
System.out.println("@".matches("^\\B\\W\\b$")); // false
System.out.println("@".matches("^\\B\\W\\B$")); // true
System.out.println("".matches("$$$$\\B\\B\\B\\B^^^")); // true
The last line may be surprising, but such is the nature of anchors.
((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
or if you want to digits to be also parts of a word:
((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'))
A boundary is a position between two characters, so a character can never be a boundary.
If you want to match a character that is not surrounded by word boundaries, e. g. the character b
in abc
, then you can use
\B.\B
Remember to escape the backslashes in a Java string, as in
Pattern regex = Pattern.compile("\\B.\\B");
Check this answer for a discussion of just what exactly a \b
boundary is and how to wrestle your regex into behaving more the way you may want it to.