You can use the regular expression (?<=value\=")(?:[^"\\<]|\\"|\\\\)++(?=")
in combination with Matcher#find()
to find all values of the XML attribute value
.
String input = "<message>\n <set name=\"..\" value=\"garbled string\" type=\"name\" />\n <set age=\"..\" value=\"32\" />\n <set something=\"..\" value=\"value=\\\"\\\"\\\"\\\"\" />\n ..\n</message>";
Pattern pattern = Pattern.compile("(?<=value\\=\")(?:[^\"\\\\<]|\\\\\"|\\\\\\\\|\\\\<)++(?=\")");
Matcher matcher = pattern.matcher(input);
StringBuilder convertedInput = new StringBuilder();
int trailing = 0;
while (matcher.find()) {
String value = matcher.group();
String convertedValue = StringEscapeUtils.escapeXml(value);
convertedInput.append(input.substring(trailing, matcher.start()));
convertedInput.append(convertedValue);
trailing = matcher.end();
}
if (trailing < input.length()) {
convertedInput.append(input.substring(trailing, input.length()));
}
System.out.println(convertedInput);
When run, convertedInput
should contain input
with - depending on the functionality of StringEscapeUtils#escapeXml(String)
- all values of each value
attribute being escaped XML strings. I added <
to the characters that must not be contained in a value without backslash escape because otherwise, attributes like name="value="
(thanks to @Thomas for pointing this out in a comment) would cause the regular expression to go haywire.
For details on the used regular expression, please visit this link.