I wish to generate a regular expression from a string containing numbers, and then use this as a Pattern to search for similar strings. Example:
String s = "Page 3 of 23"
If I substitute all digits by \d
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (Character.isDigit(c)) {
sb.append("\\d"); // backslash d
} else {
sb.append(c);
}
}
Pattern numberPattern = Pattern.compile(sb.toString());
// Pattern numberPattern = Pattern.compile("Page \d of \d\d");
I can use this to match similar strings (e.g. "Page 7 of 47"
). My problem is that if I do this naively some of the metacharacters such as (){}-
, etc. will not be escaped. Is there a library to do this or an exhaustive set of characters for regular expressions which I must and must not escape? (I can try to extract them from the Javadocs but am worried about missing something).
Alternatively is there a library which already does this (I don't at this stage want to use a full Natural Language Processing solution).
NOTE: @dasblinkenlight's edited answer now works for me!