2

Consider the following string:

blah, foo(a,b), bar(c,d), yo

I want to extract a list of strings:

blah
foo(a,b)
bar(c,d)
yo

It seems to me that I should be able to use quote words here, but I'm struggling with the regex. Can someone help me out?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Spacemoose
  • 3,856
  • 1
  • 27
  • 48
  • I added a solution that handles nested parentheses. It's probably slower than @stribizhev's, though, so if you don't need to handle that, use theirs. – Lynn Jul 10 '15 at 10:08
  • 3
    Also, wild guess, but for the string you wrote, you can simply split on `, ` (note the space). The inner "arguments" don't contain spaces. If your input is like that too, you might as well just do that. – Lynn Jul 10 '15 at 10:10

3 Answers3

3

Perl has a little thing regex recursion, so you might be able to look for:

  • either a bare word like blah containing no parentheses (\w+)

  • a "call", like \w+\((?R)(, *(?R))*\)

The total regex is (\w+(\((?R)(, ?(?R))*\))?), which seems to work.

Lynn
  • 10,425
  • 43
  • 75
1

You can use the following regex to use in split:

\([^()]*\)(*SKIP)(*F)|\s*,\s*

With \([^()]*\), we match a ( followed with 0 or more characters other than ( or ) and then followed with ). We fail the match with (*SKIP)(*F) if that parenthetical construction is found, and then we only match the comma surrounded with optional whitespaces.

See demo

#!/usr/bin/perl
my $string= "blah, foo(a,b), bar(c,d), yo";
my @string = split /\([^()]*\)(*SKIP)(*F)|\s*,\s*/, $string;

foreach(@string) {
    print "$_\n";
}

To account for commas inside nested balanced parentheses, you can use

my @string = split /\((?>[^()]|(?R))*\)(*SKIP)(*F)|\s*,\s*/, $string;

Here is an IDEONE demo

With \((?>[^()]|(?R))*\) we match all balanced ()s and fail the match if found with the verbs (*SKIP)(*F), and then we match a comma with optional whitespace around (so as not to manually trim the strings later).

For a blah, foo(b, (a,b)), bar(c,d), yo string, the result is:

blah
foo(b, (a,b))
bar(c,d)
yo
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

There is a solution given by Borodin for one of your question (which is similar to this question). A small change of regex will give you desire output: (this will not work for nested parentheses)

use strict;
use warnings;
use 5.010;

my $line = q<blah, foo(a,b), bar(c,d), yo>;

my @words = $line =~ / (?: \([^)]*\) | [^,] )+ /xg;

say for @words;

Output:

blah
 foo(a,b)
 bar(c,d)
 yo
Community
  • 1
  • 1
serenesat
  • 4,611
  • 10
  • 37
  • 53