0

Trying to parse the output of monitoring plugins I ran into a problem where the match result was unexpected by me:

First consider this debugger session with Perl 5.18.2:

 DB<6> x $_
0  'last=0.508798;;;0'
  DB<7> x $RE
0  (?^u:^((?^u:\'[^\'=]+\'|[^\'= ]+))=((?^u:\\d+(?:\\.\\d*)?|\\.\\d+))(s|%|[KMT]?B)?(;(?^u:\\d+(?:\\.\\d*)?|\\.\\d+)?){0,4}$)
   -> qr/(?^u:^((?^u:'[^'=]+'|[^'= ]+))=((?^u:\d+(?:\.\d*)?|\.\d+))(s|%|[KMT]?B)?(;(?^u:\d+(?:\.\d*)?|\.\d+)?){0,4}$)/
  DB<8> @m = /$RE/

  DB<9> x @m
0  'last'
1  0.508798
2  undef
3  ';0'
  DB<10>

OK, the regex $RE (intended to match "'label'=value[UOM];[warn];[crit];[min];[max]") looks terrifying at a first glance, so let me show the construction of it:

my $RE_label = qr/'[^'=]+'|[^'= ]+/;
my $RE_simple_float = qr/\d+(?:\.\d*)?|\.\d+/;
my $RE_numeric = qr/[-+]?$RE_simple_float(?:[eE][-+]?\d+)?/;
my $RE = qr/^($RE_label)=($RE_simple_float)(s|%|[KMT]?B)?(;$RE_simple_float?){0,4}$/;

The relevant part is (;$RE_simple_float?){0,4}$ intended to match ";[warn];[crit];[min];[max]" (still not perfect), so for ";;;0" I'd expect @m to end with ';', ';', ';0'. However it seems the matches are lost, except for the last one.

Did I misunderstand something, or is it a Perl bug?

U. Windl
  • 3,480
  • 26
  • 54

2 Answers2

2

When you use {<number>} (or + or * for that matter) after a capture group, only the last value that is matched by the capture group is stored. This explain why you only end up with ;0 instead of ;;;0 in your fourth capture group: (;$RE_simple_float?){0,4} sets the fourth capture group to the last element it matches.

Top fix that, I would recommend to match the whole end of the string, and split it afterwards:

my $RE = qr/...((?:;$RE_simple_float?){0,4})$/;
my @m = /$RE/;
my @end = split /;/, $m[3]; # use /(?<=;)/ to keep the semicolons

Another solution is to repeat the capture group: replace (;$RE_simple_float?){0,4} with

(;$RE_simple_float?)?(;$RE_simple_float?)?(;$RE_simple_float?)?(;$RE_simple_float?)?

The capture groups that do not match will be set to undef. This issue with this approach is that it's a bit verbose, and only works for {}, but not for + or *.

Dada
  • 6,313
  • 7
  • 24
  • 43
  • Do you have a reference inside "perlre" for that explanation? I browsed it, but it's a bit lengthy. I'd think if it's not mentioned somewhere, it is a bug in Perl. – U. Windl Aug 26 '21 at 09:58
  • @U.Windl I didn't find any reference to this in perlre and perlretut. However, this behavior is well known. See for instance [this StackOverflow question](https://stackoverflow.com/questions/25986557/perl-regex-to-capture-repeating-group), [this other StackOverflow question](https://stackoverflow.com/questions/3459721/regex-group-in-perl-how-to-capture-elements-into-array-from-regex-group-that-ma) or [this blog post](http://blogs.perl.org/users/sirhc/2012/05/repeated-capturing-and-parsing.html) – Dada Aug 26 '21 at 10:24
  • Thanks tag `capture-group` was hard to find; I just tried to add some description for it. – U. Windl Aug 26 '21 at 10:51
1

Following demo code utilizes split to obtain data of interest. Investigate if it will fit as a solution for your problem.

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

while( <DATA> ) {
    chomp;
    say;
    my $record;
    $record->@{qw/label value warn crit min max/} = split(/[=;]/,$_);
    say Dumper($record);
}

exit 0;

#'label'=value[UOM];[warn];[crit];[min];[max]

__DATA__
'label 1'=0.3345s;0.8s;1.2s;0.2s;3.2s
'label 2'=10%;7%;18%;2%;28%
'label 3'=0.5us;2.3us

Output

'label 1'=0.3345s;0.8s;1.2s;0.2s;3.2s
$VAR1 = {
          'crit' => '1.2s',
          'warn' => '0.8s',
          'value' => '0.3345s',
          'label' => '\'label 1\'',
          'max' => '3.2s',
          'min' => '0.2s'
        };

'label 2'=10%;7%;18%;2%;28%
$VAR1 = {
          'min' => '2%',
          'max' => '28%',
          'label' => '\'label 2\'',
          'value' => '10%',
          'warn' => '7%',
          'crit' => '18%'
        };

'label 3'=0.5us;2.3us
$VAR1 = {
          'min' => undef,
          'max' => undef,
          'label' => '\'label 3\'',
          'warn' => '2.3us',
          'value' => '0.5us',
          'crit' => undef
        };
Polar Bear
  • 6,762
  • 1
  • 5
  • 12
  • 1
    Honestly this is an elegant solution for the problem, but actually it does not answer the question *why* my code does not work. – U. Windl Aug 26 '21 at 10:02