I need to find the first complete pair of parentheses in a Java String and, if it is non-nested, return its content. The current issue is that parentheses may be represented by different characters in different locales/languages.
My first idea was of course to use regular expressions. But beside the fact that it seems quite difficult (at least to me) to make sure that there are no nested parentheses in the currently considered match if something like "\((.*)\)" is used, there seems to be no class of parenthesis-like characters available in Java's Matcher.
Thus, I tried to solve the problem more imperatively, but stumbled across the issue that the data I need to process is in different languages, and there are different parentheses' characters depending on the locale. Western: (), Chinese (Locale "zh"): ()
package main;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.HashSet;
import java.util.Set;
public class FindParentheses {
static public Set<String> searchNames(final String string) throws IOException {
final Set<String> foundName = new HashSet<>();
final BufferedReader stringReader = new BufferedReader(new StringReader(string));
for (String line = stringReader.readLine(); line != null; line = stringReader.readLine()) {
final int indexOfFirstOpeningBrace = line.indexOf('(');
if (indexOfFirstOpeningBrace > -1) {
final String afterFirstOpeningParenthesis = line.substring(indexOfFirstOpeningBrace + 1);
final int indexOfNextOpeningParenthesis = afterFirstOpeningParenthesis.indexOf('(');
final int indexOfNextClosingParenthesis = afterFirstOpeningParenthesis.indexOf(')');
/*
* If the following condition is fulfilled, there is a simple braced expression
* after the found product's short name. Otherwise, there may be an additional
* nested pair of braces, or the closing brace may be missing, in which cases the
* expression is rejected as a product's long name.
*/
if (indexOfNextClosingParenthesis > 0
&& (indexOfNextClosingParenthesis < indexOfNextOpeningParenthesis
|| indexOfNextOpeningParenthesis < 0)) {
final String content = afterFirstOpeningParenthesis.substring(0, indexOfNextClosingParenthesis);
foundName.add(content);
}
}
}
return foundName;
}
public static void main(final String args[]) throws IOException {
for (final String foundName : searchNames(
"Something meaningful: shortName1 (LongName 1).\n" +
"Localization issue here: shortName2 (保险丝2). This one should be found, too.\n" +
"Easy again: shortName3 (LongName 3).\n" +
"Yet more random text...")) {
System.out.println(foundName);
}
}
}
The second thing with Chinese parentheses is not found, but should be. Of course I might match those characters as an additional special case, but as my project uses 23 languages, including Korean and Japanese, I would prefer a solution that finds any pairs of parentheses.