0

Is it possible to simplify this regex and still capture all groups?

my $str = "1 2 3 4 ;";
$str =~ /(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+/;
print Dumper( [$1, $2, $3, $4] );

Output:

$VAR1 = [
          '1',
          '2',
          '3',
          '4'
        ];

I tried using a quantifier to simplify:

$str =~ /(?:(\d+)\s+){4}/;

but it gives:

$VAR1 = [
          '4',
          undef,
          undef,
          undef
        ];
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • The first regex looks as simple and straight forward as it gets. In which way should it be simpler? Simpler to scale to more groups? Simpler to read? Simpler to maintain? Consider looking at the definition of the tag "optimization". – Yunnosch Jul 19 '20 at 08:09
  • Simpler to scale to N groups – Håkon Hægland Jul 19 '20 at 08:12
  • I recommend to clarify that by [edit]ing your question. Consider showing the cases for 5 and 6 groups as example to help with visualising your problem. – Yunnosch Jul 19 '20 at 08:13
  • Just `\d+` would have 4 matches isntead of actual groups... that's maybe even simplest? – JvdV Jul 19 '20 at 08:28
  • Also: https://stackoverflow.com/questions/62785822/how-do-i-grab-an-unknown-number-of-captures-from-a-pattern/62786052 – brian d foy Jul 19 '20 at 15:11

2 Answers2

3

Yes, use /g to match all numbers.

my @matches = $str =~ /\d+/g

Alternatively, split on whitespace and filter for numbers.

grep /\d+/, split /\s+/, $str;

$str =~ /(?:(\d+)\s+){4}/; does not work because while {4} causes it to match all four instances of \d+\s+, it does not change that there is only a single capture group.

Schwern
  • 153,029
  • 25
  • 195
  • 336
1

If you want the groups by number, you would have to create them all as you did.

Using (?:(\d+)\s+){4} will repeat the outer group 4 times, capturing only the value of the last iteration in group 1.


One option could be to use \G to get all the digits in group 1.

\G(\d+)\h+(?=[\d\h]*;)

Explanation

  • \G Assert the position at the end of the previous match or at the start
  • (\d+)\h+ Capture group 1, capture 1+ digits and match 1+ horizontal whitespace chars
  • (?= Positive lookahead, assert what is on the right is
    • [\d\h]*; Match 0+ times a digit or horizontal whitespace char and ;
  • ) Close lookahead

Regex demo

For example

my $str = "1 2 3 4 ;";
while ($str =~ /\G(\d+)\h+(?=[\d\h]*;)/g) {
    print "$1\n";
}

Output

1
2
3
4
The fourth bird
  • 154,723
  • 16
  • 55
  • 70