2

I am using groups to try to match on a certain pattern, and am not getting quite the results I expect. The pattern of interest are as follows:

([0-9]+(\.[0-9]+)+)

For string 1.23, I get $1=1.23, and $2=.23 which makes sense to me.

But for string 1.2.3, I get $1=1.2.3 and $2=.3, where I would expect $2=.2.3, because its group is a decimal point and a digit, repeated.

Can someone please explain to me how this works? Thank you!

prelic
  • 4,450
  • 4
  • 36
  • 46
  • You're close. To get what you are after in `$2`, you need another set of parentheses. See my answer below. – DavidRR Dec 03 '13 at 16:16

3 Answers3

4

When you use capturing groups with a quantifier, only the last repetition of the captured pattern will be stored.

Hunter McMillen
  • 59,865
  • 24
  • 119
  • 170
3

"These pattern match variables are scalars and, as such, will only hold a single value. That value is whatever the capturing parentheses matched last."

http://blogs.perl.org/users/sirhc/2012/05/repeated-capturing-and-parsing.html

In you example, $1 matches 1.2.3. As the pattern repeats, $2 would be set to .2 until the final match of .3

fugu
  • 6,417
  • 5
  • 40
  • 75
3

Perhaps this regex will meet your needs:

\b(\d+)((?:\.\d+)+)\b

This regex separates the leading integer sequence from its repeating fractional components.

(As indicated by @ysth, please keep in mind that \d may match more characters than you intend. If that is the case, use the character class [0-9] instead or use the /a modifier.)

Here's a Perl program that demonstrates this regex on a sample data set. (Also see the live demo.)

#!/usr/bin/perl -w

use strict;
use warnings;

while (<DATA>) {
    chomp;

    # A - A sequence of digits
    # B - A period and a sequence of digits
    # C - Repeat 'B'.

    if (/\b(\d+)((?:\.\d+)+)\b/) {
#           ^^^     ^^^^^
#            A        B
#                   ^^^^^^^
#                      C

        print "[$1]  [$2]\n";
    }
}

__END__
1.23
123.456
1.2.3
1.22.333.444

Expected Output:

[1]  [.23]
[123]  [.456]
[3]  [.2.3]
[4]  [.22.333.444]
Community
  • 1
  • 1
DavidRR
  • 18,291
  • 25
  • 109
  • 191
  • 1
    changing `[0-9]` to `\d` matches a whole lot more characters (unless you also use the /a flag) – ysth Dec 03 '13 at 16:13
  • @ysth - [Does “\d” in regex mean a digit?](http://stackoverflow.com/a/6479605/1497596). Does that answer fully define what you mean by "a whole lot more characters"? – DavidRR Dec 03 '13 at 16:31
  • From the [PerlRE doc](http://perldoc.perl.org/perlre.html): `/d`, `/u` , `/a` , and `/l` , available starting in **5.14**, are called the **character set modifiers**; they affect the character set semantics used for the regular expression. ... The `/a` modifier, on the other hand, may be useful. Its purpose is to allow code that is to work mostly on **ASCII** data to not have to concern itself with Unicode. – DavidRR Dec 03 '13 at 16:37