0

I try to be able to extract the html element name that comes in 2 form in one regexp in perl for example i have this :

document.all.ElemName.

and also

document.all["ElemName"].

and this

document.all['ElemName'].

and i need to get the ElemName , i can only capture one opetion , is it posible to extract it in 1 regexp ? this is what i have :

document.all[\.\w|\[](\w+) 

that capture only the first example

user63898
  • 29,839
  • 85
  • 272
  • 514

3 Answers3

1

You can use named captures, available since Perl v5.10:

#!/usr/bin/env perl
use strict;
use warnings;

my @array = qw{
    document.all.ElemName1.
    document.all["ElemName2"].
    document.all['ElemName3'].
};

for (@array) {
    /
        \b
        document\.all
        (?:
            \.(?<elem>\w+)
            | \["(?<elem>\w+)"\]
            | \['(?<elem>\w+)'\]
        )
        \.
    /x;

    print $+{elem}, "\n";
}
creaktive
  • 5,193
  • 2
  • 18
  • 32
1

This will match all three cases with ElemName in the first capture group:

document\.all\.?(?:\[["'])?(\w+)(?:['"]\])?

Demo here.

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • 1
    This will also match the following incorrect strings: `document.all.ElemName"]`, `document.all["ElemName']`, `document.all.["ElemName"]` ... – Zaid Jan 14 '13 at 14:54
  • @Zaid that is true but you shouldn't parse html with regexp at all but the OP seems determine to, I don't think it's a problem as the incorrect *(unbalanced)* examples won't exist in the document, if they do the OP has more problems than just trying to parsing a non-regular language with regexp. – Chris Seymour Jan 14 '13 at 14:59
0

you could use a character class containing a single and a double quote to match the quotes:

$a = 'document.all.Element["ElemNamea"]';
$b = "document.all.Element['ElemNameb']";
print "a : $a\n";
print "b : $b\n\n";

$a =~ /document.all.Element\[['"](\w+)['"]\]/;  # ["'] matches ' or "
print "result: $a and $1\n";                    # result is in $1
$b =~ /document.all.Element\[['"](\w+)['"]\]/;
print "result: $b and $1\n";

output:

a : document.all.Element["ElemNamea"]
b : document.all.Element['ElemNameb']

result: document.all.Element["ElemNamea"] and ElemNamea
result: document.all.Element['ElemNameb'] and ElemNameb
user1967890
  • 185
  • 1
  • 13