I am trying to perform multiple string replacements using Java's Pattern and Matcher, where the regex pattern may include metacharacters (e.g. \b, (), etc.). For example, for the input string fit i am
, I would like to apply the replacements:
\bi\b --> EYE
i --> I
I then followed the coding pattern from two questions (Java Replacing multiple different substring in a string at once, Replacing multiple substrings in Java when replacement text overlaps search text). In both, they create an or'ed search pattern (e.g foo|bar) and a Map of (pattern, replacement), and inside the matcher.find()
loop, they look up and apply the replacement.
The problem I am having is that the matcher.group()
function does not contain information on matching metacharacters, so I cannot distinguish between i
and \bi\b
. Please see the code below. What can I do to fix the problem?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.*;
public class ReplacementExample
{
public static void main(String argv[])
{
Map<String, String> replacements = new HashMap<String, String>();
replacements.put("\\bi\\b", "EYE");
replacements.put("i", "I");
String input = "fit i am";
String result = doit(input, replacements);
System.out.printf("%s\n", result);
}
public static String doit(String input, Map<String, String> replacements)
{
String patternString = join(replacements.keySet(), "|");
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
StringBuffer resultStringBuffer = new StringBuffer();
while (matcher.find())
{
System.out.printf("match found: %s at start: %d, end: %d\n",
matcher.group(), matcher.start(), matcher.end());
String matchedPattern = matcher.group();
String replaceWith = replacements.get(matchedPattern);
// Do the replacement here.
matcher.appendReplacement(resultStringBuffer, replaceWith);
}
matcher.appendTail(resultStringBuffer);
return resultStringBuffer.toString();
}
private static String join(Set<String> set, String delimiter)
{
StringBuilder sb = new StringBuilder();
int numElements = set.size();
int i = 0;
for (String s : set)
{
sb.append(Pattern.quote(s));
if (i++ < numElements-1) { sb.append(delimiter); }
}
return sb.toString();
}
}
This prints out:
match found: i at start: 1, end: 2
match found: i at start: 4, end: 5
fIt I am
Ideally, it should be fIt EYE am
.