here's my solution:
// can extract annotation and text-inside-parentheses
private static final String REGEX = "@(\\w+)\\((.+)\\)";
//Read File
List<String> lines = Files.readAllLines(Paths.get(filename));
//Create a pattern to find for
Pattern pattern = Pattern.compile(REGEX);
// extractor function uses pattern's second group (text-within-parentheses)
Function<String, String> extractOnlyTextWithinParentheses = s -> {
Matcher m = pattern.matcher(s);
m.find();
return m.group(2);
};
// all lines are filtered and text will be extracted using extractor-fn
Stream<String> streamOfExtracted = lines.stream()
.filter(pattern.asPredicate())
.map(extractOnlyTextWithinParentheses);
//Perform desired operation
streamOfExtracted.forEach(System.out::println);
Explanation:
Let's first clarify what the used regex-pattern @(\\w+)\\((.+)\\)
should do:
ASSUMING: you filter the text for a Java-like annotation like @MyPattern
matching specific lines using regular expression
@\\w+
matches an at-symbol followed by a word (\\w
is special meaning and stands for word, i.e. alphabetic letter and underscores). So it will match any annotation (e.g. @Trace
, @User
and so on).
\\(.+\\)
matches some text inside parentheses (e.g. ("10869")
, where parentheses must be escaped too \\(
and \\)
and .+
for any non-empty text inside
Note: unescaped parentheses have a special meaning inside any regular expression, that is grouping & capturing
For matching parentheses and extract their contents see this answer on Pattern to extract text between parenthesis.
extracting text using capture groups inside regular expression
Simply use parentheses (un-escaped) to form a group and remember their order-number.
(grouped)(Regex)
will match the text groupedRegex
and can extract two groups:
- group #1:
grouped
- group #2:
Regex
To get these groups use matcher.find()
and then matcher.group()
or its overloaded methods.
option to test the regular expression and extraction
When inside IntelliJ you could use the action Check RegExp in IntelliJ: ALT+Enter on the selected regex to test and adapt it.
Similar there are quite many websites to test regular expressions. For example http://www.regExPlanet.com also supports Java-RegEx-Syntax and you can verify extracted groups online. See example on RegexPlanet.
Note: There is one special meaning of the caret besides beginning like Ole answered above: this [^)]+
means match anything (at least 1 character) except the closing parentheses
make it extendable using an extractor-functional
If you replace the extract-Function used as argument to the .map(..)
above by following you can also print both, the annotation-name and text-inside-parentheses (tab-separated):
Function<String, String> extractAnnotationAndTextWithinParentheses = s -> {
Matcher m = pattern.matcher(s);
m.find();
StringBuilder sb = new StringBuilder();
int lastGroup = m.groupCount();
for (int i = 1; i <= lastGroup; i++) {
sb.append(m.group(i));
if (i < lastGroup) sb.append("\t");
}
return sb.toString();
};
Summary:
Your streaming was effective.
Your regular expression had an error:
- it almost matched on a constant annotation, namely
@MyPattern
- you tried capturing correclty using parentheses
- there was a syntax-error or typo inside your regular expression, the caret
^
- not using escaped parentheses
\\(
and \\)
you would have gotten not only text-inside but also parentheses as extract