3

I am coding a script in Perl and need to extract some information from a text file.

This is what my code looks like - the string values are made up but represent all possible string variations.

my @alpha = ("abcdefgh(i) jklmno(pqrs3), uvwxyz", 
             "abcdefghi jklmn(opq1st), uvwxyz",
             "abcdefghi jklmn(o_q(1s3)), uvwxyz",
             "abcdef(gh)i jklmno(pq(1s3)), uvwxyz");

foreach my $line (@alpha){
    if ($line =~ /\((.*\(?.*\)?)\),/){
    print $1
    }
}

I am trying to capture the large text between the last set of parenthesis (or brackets for us British English speakers).
Please note I am using the "dot" operator since I want to match anything, text, numbers, or other special characters.

Essentially I want to print out:

pqrs3
opq1st
o_q(1s3)
pq(1s3)

But I keep getting:

 (i) jklmno(pqrs3) <-- not ok
 opq1st <-- this is ok
 opq(1s3) <-- this is also ok
 gh)i jklmno(pq(1s3) <-- not ok

What am I doing wrong? or is it even possible to match this way?
Any help is appreciated.

serenesat
  • 4,611
  • 10
  • 37
  • 53
Sid5427
  • 721
  • 3
  • 11
  • 19
  • Do you always want the _last outer_ match? You may need to use a parser here. Have a look at [this SO article](http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns). – Tim Biegeleisen Dec 02 '15 at 05:44
  • yes I do, and its not necessary to have a single line of regex. I did try to get everything between the first parenthesis and the last, but that did not work as well. – Sid5427 Dec 02 '15 at 05:48
  • One other question: Do you know how many parentheses groups will be present? It seems to vary at 1-2 in your examples. – Tim Biegeleisen Dec 02 '15 at 05:52
  • The last string in the array has the maximum number of possible parentheses groups. There aren't any other possible groups present in my data. – Sid5427 Dec 02 '15 at 05:57
  • You need to deal with the possible of recursive regex for nested parentheses. – Tim Biegeleisen Dec 02 '15 at 06:05
  • We British English speakers also use *parentheses*. The difference is that, in the UK, a *bracket* is generally a parenthesis, whereas in American English it is understood to be a square bracket – Borodin Dec 02 '15 at 07:52
  • @Borodin - the way I know it is Brackets (), square brackets [] and braces {} (or curly braces)... gets confusing talking to coders in the US. – Sid5427 Dec 02 '15 at 08:11
  • *“Parentheses”* is fine everywhere – Borodin Dec 02 '15 at 08:14

2 Answers2

2
(\((?:[^()]|(?1))*\))(?!.*\()

You can use recursive regex here.See demo.

https://regex101.com/r/hE4jH0/21

vks
  • 67,027
  • 10
  • 91
  • 124
2

Here is a way with given string:

use warnings;
use strict;

my @alpha = ("abcdefgh(i) jklmno(pqrs3), uvwxyz", 
             "abcdefghi jklmn(opq1st), uvwxyz",
             "abcdefghi jklmn(o_q(1s3)), uvwxyz",
             "abcdef(gh)i jklmno(pq(1s3)), uvwxyz");

foreach my $line (@alpha)
{
    if ( $line =~ m/.*\s+\w+\((.*)\),\s+\w+/ )
    {
        print $1, "\n";
    }
}

Output:

pqrs3
opq1st
o_q(1s3)
pq(1s3)
serenesat
  • 4,611
  • 10
  • 37
  • 53
  • The spacing you're using as an anchor is probably not what the OP wants. – Miller Dec 02 '15 at 06:46
  • @Miller is right - I did not intend to use the spacing as an anchor, but looking at it now - it looks like a good alternative as well. – Sid5427 Dec 02 '15 at 06:52