UPDATE 2
The Incremental Java says:
- Each identifier must have at least one character.
- The first character must be picked from: alpha, underscore, or dollar sign. The first character can not be a digit.
- The rest of the characters (besides the first) can be from: alpha, digit, underscore, or dollar sign. In other words, it can be any valid identifier character.
Put simply, an identifier is one or more characters selected from alpha, digit, underscore, or dollar sign. The only restriction is the first character can't be a digit.
So, you'd better use
String pattern = "(?:\\b[_a-zA-Z]|\\B\\$)[_$a-zA-Z0-9]*+";
See the regex demo
UPDATE
Acc. to Representing identifiers using Regular Expression, the identifier regex is [_a-zA-Z][_a-zA-Z0-9]*
.
So, you may use
String pattern = "\\b[_a-zA-Z][_a-zA-Z0-9]*\\b";
NOTE that it allows _______
.
You can use
String p = "\\b_*[a-zA-Z][_a-zA-Z0-9]*\\b";
To avoid that. See IDEONE demo.
String s = "(identifier1 identifier_2 23 4) ____ 33";
String p = "\\b_*[a-zA-Z][_a-zA-Z0-9]*\\b";
System.out.println(s.replaceAll(p, "$0#"));
Output: (identifier1# identifier_2# 23 4) ____ 33
OLD ANSWER
You can use the following pattern:
String p = "\\b(?!\\d+\\b)[A-Za-z0-9]+(?:_[A-Za-z0-9]+)*\\b";
Or (if a _
can appear at the end):
String p = "\\b(?!\\d+\\b)[A-Za-z0-9]+(?:_[A-Za-z0-9]*)*\\b";
See the regex demo
The pattern requires that the whole word (as the expression is enclosed with word boundaries \b
) should not be equal to a number (it is checked with (?!\d+\b)
), and the unrolled part [A-Za-z0-9]+(?:_[A-Za-z0-9])*
matches non-underscore word character chunks that are followed by zero or more sequences of an underscore followed with non-underscore word character chunks.
IDEONE demo:
String s = "(identifier1 identifier_2 23 4) ____ 33";
String p = "\\b(?!\\d+\\b)[A-Za-z0-9]+(?:_[A-Za-z0-9]*)*\\b";
System.out.println(s.replaceAll(p, "$0#"));
Output: (identifier1# identifier_2# 23 4) ____ 33