Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?
This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.
For example, consider a simple regex like \d+\.\d+
that matches numbers with a decimal point like 1.2
, as well as the following code:
String digit = "d";
String point = ".";
String regex1 = "\\d+\\.\\d+";
String regex2 = Pattern.quote(digit + "+" + point + digit + "+");
Pattern numbers1 = Pattern.compile(regex1);
Pattern numbers2 = Pattern.compile(regex2);
System.out.println("Regex 1: " + regex1);
if (numbers1.matcher("1.2").matches()) {
System.out.println("\tMatch");
} else {
System.out.println("\tNo match");
}
System.out.println("Regex 2: " + regex2);
if (numbers2.matcher("1.2").matches()) {
System.out.println("\tMatch");
} else {
System.out.println("\tNo match");
}
Not surprisingly, the output produced by the above code is:
Regex 1: \d+\.\d+
Match
Regex 2: \Qd+.d+\E
No match
That is, regex1
matches 1.2
but regex2
(which is "dynamically" built) does not (instead, it matches the literal string d+.d+
).
So, is there a method that would automatically escape each regex meta-character?
If there were, let's say, a static escape()
method in java.util.regex.Pattern
, the output of
Pattern.escape('.')
would be the string "\."
, but
Pattern.escape(',')
should just produce ","
, since it is not a meta-character. Similarly,
Pattern.escape('d')
could produce "\d"
, since 'd'
is used to denote digits (although escaping may not make sense in this case, as 'd'
could mean literal 'd'
, which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.'
).