I am trying to use the Java regex matcher to search and replace. However, after it failed to match a certain string, I noticed that the expression ".*" seems to fail to match certain Unicode characters (in my case it was a \u2028 LINE SEPARATOR
character).
This is what I have at the moment (match an XML element with any text in between):
String segSourceSearch = "<source(.?)>(.*?)</source>";
String segSourceReplace = "<source$1>$2</source><target$1>$2</target>";
myString = myString.replaceAll(segSourceSearch, segSourceReplace);
Basically, what this is supposed to do is duplicate the element.
But how can I modify the regex (.*?)
to match any Unicode character between <source>
and </source>
? Is there a built-in pattern in Java? If not, is there anything in ICU4J that I could use? (I haven't been able to find a regex matcher in ICU4J).