The problem is with the .*?
pattern at the end of the regex pattern. It never consumes any text because lazy pattern is always skipped first, the subsequent patterns are tried first. Here, there is nothing after .*?
and it means it is fine to return a valid match without consuming anything with the last .*?
.
One possible solution is splitting the string with a regex that keeps captured substrings in the output. Unfortunately, it is not directly supported by Dart, so I enhanced this solution to account for your case:
extension RegExpExtension on RegExp {
List<List<String?>> allMatchesWithSep(String input, int grpnum, bool includematch, [int start = 0]) {
var result = List<List<String?>>.empty(growable: true);
for (var match in allMatches(input, start)) {
var res = List<String?>.empty(growable: true);
res.add(input.substring(start, match.start));
if (includematch) {
res.add(match.group(0));
}
for (int i = 0; i < grpnum; i++) {
res.add(match.group(i+1));
}
start = match.end;
result.add(res);
}
result.add([input.substring(start)]);
return result;
}
}
extension StringExtension on String {
List<List<String?>> splitWithDelim(RegExp pattern, int grpnum, bool includematch) =>
pattern.allMatchesWithSep(this, grpnum, includematch);
}
void main() {
String text = "test1 [B]bold text[/B] test2 [U]underlined[/U] test3";
RegExp rx = RegExp(r"\[[UBI]\]([\w\W]*?)\[\/[UBI]\]");
print(text.splitWithDelim(rx, 1, true));
}
Output:
[[test1 , [B]bold text[/B], bold text], [ test2 , [U]underlined[/U], underlined], [ test3]]
Note the pattern now contains just one capturing group, and this is grpnum
value (group number). Since you need the whole match in the results, the includematch
is set to true
.
The [\w\W]
will match any chars including line break chars, .
does not match them by default.