2

I have a string that looks like this

my $source = "PayRate=[[sDate=05Jul2017,Rate=0.05,eDate=06Sep2017]],item1,item2,ReceiveRate=[[sDate=05Sep2017,Rate=0.06]],item3" ;

I want to use capture groups to extract only the PayRate values contained within the first [[...]] block.

$1 = "Date=05Jul2017,Rate=0.05,EDate=06Sep2017"

I tried this but it returns the entire string.

my $match =~ m/PayRate=\[\[(.*)\]\],/ ;

It is clear that I have to put specific patterns for the series of {(.*)=(.*)} blocks inside. Need expert advice.

PExplorer
  • 61
  • 6

2 Answers2

4

You are using a greedy match .*, which consumes as much input as possible while still matching, you're matching the first [[ to the last ]].

Instead, use a reluctant match .*?, which matches as little as possible while still matching:

my ( $match) = $source =~ /PayRate=\[\[(.*?)\]\]/;
Borodin
  • 126,100
  • 9
  • 70
  • 144
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Thank you Bohemian!. I just learned little bit more about greedy vs non-greedy quantifiers. Works like a charm now. Below link explains well : https://stackoverflow.com/questions/3075130/what-is-the-difference-between-and-regular-expressions – PExplorer Dec 03 '17 at 01:53
  • *Reluctant* is not the opposite of *greedy*. – Borodin Dec 03 '17 at 11:55
  • @Borodin what's your point? (btw I did not state that reluctant is the opposite of greedy, although they are more or less "opposite") – Bohemian Dec 03 '17 at 12:21
  • @Bohemian: I'm saying that *reluctant* is a poor choice to describe a *non-greedy* quantifier (as is *lazy*) and as far as I know it's not commonly used outside Oracle's Java documentation. If I had my way I'd choose *frugal*, but that's even less popular. – Borodin Dec 03 '17 at 12:26
0

Use the /x modifier on match (and substitute) so you can use white space for easier reading and comments to tell what is going on. Limit the patterns by matching everything not in the pattern. [^\]]*? is better than .*?.

my ( $match ) = $line =~ m{
    Payrate \= \[\[  # select which part
    ( [^\]]*? )      # capture anything between square brackets
}x;
shawnhcorey
  • 3,545
  • 1
  • 15
  • 17