3

I am having a hard time adapting the answer in this thread to the following problem:

I would like to split the following string:

my $string = "foo{age}, bar{height}. something_else. baz{weight,so='yes',brothers=john.smith}.test{some}"

around the outer dots. The result should be an array holding

("foo{age}, bar{height}", 
 "foo{weight,parents='yes',brothers=john.smith}", 
 "test{some}")

I would like to avoid making assumptions about what's inside the groups inside {}.

How can I do this in Perl?

I tried adapting the following:

print join(",",split(/,\s*(?=\w+{[a-z,]+})/g, $string));

by replacing what's inside the character class [] without success.

Update:

The only characters not allowed within a {} group are { or }

Community
  • 1
  • 1
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

3 Answers3

5

Since you are not dealing with nested braces, the periods you want are those which are not "immediately" followed by a closing }. Where "immediately" means, without an opening { in between:

split(/[.]\s*(?![^{]*[}])/g, $string)

Alternatively, to match the parts you're interested in:

(?:[^.{}]|[{][^{}]*[}])+

Which can be "unrolled" to:

[^.{}]*(?:[{][^{}]*[}][^.{}]*)*
Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • +1. Thanks. This is a good solution, but I updated the OP, since the dots dot not necessarily have to be next to a closing `}` – Amelio Vazquez-Reina Aug 01 '13 at 01:23
  • @user815423426 my solution does not assume that (in neither of the two approaches) – Martin Ender Aug 01 '13 at 01:24
  • Got it. I guess that you are right, since it's a weak assumption. I can probably work with this assumption, although the spirit of the post is to make *minimal assumptions* (effectly no assumptions) about what's inside `{}` – Amelio Vazquez-Reina Aug 01 '13 at 01:31
  • @m.buettner: forget my precedent comment, because I saw the pattern like this: `[^\s.{](?:[^{.]*{[^}]*})+` – Casimir et Hippolyte Aug 01 '13 at 01:37
  • 1
    Nice solution! Your split is great, although your other two are incorrect. E.g., "foo{age}, bar{height}. baz{weight,so='yes',brothers=john.smith}.test{some}" -> "foo", "", "age", "", ", bar", "", "height", "", "", " baz", "", "weight,so='yes',brothers=john", "", "smith", "", "", "test", "", "some", "", "" – Joseph Myers Aug 01 '13 at 01:37
  • 1
    @JosephMyers oh right, they were both missing a quantifier. fixed. – Martin Ender Aug 01 '13 at 10:17
1

Here is how I would have solved the problem:

  1. We define an item:

    my $item = qr/ \w+ (?: [{] [^{}]* [}] )? /x;
    

    That is, some word characters and optionally a section inside braces.

  2. We define item groups, separated by comma:

    my $item_group = qr/$item \s* (?: , \s* $item \s* )*/x;
    

    That is, an $item followed by zero or more comma-item sequences.

  3. We extract the results by matching for an item group that is followed by a period or the end of string:

    my @result = $string =~ /\G ($item_group) \s* (?: [.] \s* | \z)/xg;
    

Output:

(
  "foo{age}, bar{height}",
  "something_else",
  "baz{weight,so='yes',brothers=john.smith}",
  "test{some}",
)
amon
  • 57,091
  • 2
  • 89
  • 149
0

You can do a match instead of a split, matching the alternatives of braces with anything between them (except braces) or non-braces with anything except for a dot.

Joseph Myers
  • 6,434
  • 27
  • 36