This gets into the thorny business of dealing with matching delimiters, possibly nested.
Instead of tangling a grand regex I'd suggest to parse the string for text which is outside of all pairs of balanced (top-level) brackets, precisely what is described in the question, using the core Text::Balanced
use warnings;
use strict;
use feature 'say';
use Text::Balanced qw(extract_bracketed);
my $string = 'hello (hi that [is] so cool) awesome {yeah}';
my @outside_of_brackets;
my ($match, $before);
my $remainder = $string;
while (1) {
($match, $remainder, $before) = extract_bracketed(
$remainder, '(){}[]', '[^({[]*'
);
push @outside_of_brackets, $before // $remainder;
last if not defined $match;
}
say for @outside_of_brackets;
We ask to find the contents of the first top-level pair of any of the given brackets,† and along with that we get what follows the pair ($remainder
) and what was before it.
It is $before
that is needed here, and we keep parsing the $remainder
the same way, picking $before
's, until there's no more matches; at that point the $remainder
has no brackets in it so we take it as well (at that point $before
must be empty as well).
The code gets expected strings, with some extra white space; trim as needed.
For another example, and for another approach using Regexp::Common, see this post.
† The extract_bracketed
extracts what's in the first top-level balanced pair of brackets, that by default need be found at the beginning of the string (after possible spaces), or right after the end of its previous match; or, after the pattern in the third argument (if given), which then must be found (thus the *
quantifier here, in case the brackets are at the beginning).
So in this case it skips up to the first opening bracket and then parses the string to look for a balanced bracket pair. Types of brackets to seek are given as its second argument.