I'have a string in a following format: _num1_num2
. I need to assign num1
and num2
values to some variables. My regex is (\d+)
and it shows me the correct match groups on Rubular.com, but I don't know how to assign these match groups to some variables. Can anybody help me? Thanks in adv.

- 1,353
- 3
- 15
- 29
3 Answers
That should be (assuming your string is stored in '$string
'):
my ($var1, $var2) = $string =~ /_(\d+)_(\d+)/s;
The idea is to grab numbers until you get a non-number character: here '_
'.
Each capturing group is then assign to their respective variable.
As mentioned in this question (and in the comments below by Kaoru):
\d
can indeed match more than 10 different characters, if applied to Unicode strings.
So you can use instead:
my ($var1, $var2) = $string =~ /_([0-9]+)_([0-9]+)/s;
-
hey!, I don't know perl but if we *match* `\d+`, won't `num1` and `num2` matched? – Amit Joki May 31 '14 at 06:39
-
@AmitJoki not if both numbers are separated by a non-number '_'. – VonC May 31 '14 at 06:40
-
Oh, thank you, it works. Could you also explain the regex please? I want to understand so I can do it myself next time. Particulary why it doesn't return `_` and what is `gs`? – kulan May 31 '14 at 06:40
-
@VonC, so it doesn't work the way javascript does. For example `"_12_24".match(/\d+/g)` returns `["12", "24"]`. – Amit Joki May 31 '14 at 06:41
-
@AmitJoki Yes, each capturing group would only get one number here. – VonC May 31 '14 at 06:45
-
@AmitJoki you can see more at http://modernperlbooks.com/books/modern_perl/chapter_06.html#Q2FwdHVyaW5n for instance. – VonC May 31 '14 at 06:48
-
Note that the \d shorthand can be a bit dangerous when running against unicode/character strings because it will match non-digit number characters. – Kaoru May 31 '14 at 15:31
-
@Kaoru Ok, I have edited the question accordingly, and added a link to another SO question which is about the issue you mention. – VonC May 31 '14 at 15:35
Using the g-modifier also allows you to do away with the the grouping parenthesis:
my ($five, $sixty) = '_5_60' =~ /\d+/g;
This allows any separation of integers but it doesn't verify the input format.

- 525
- 2
- 4
The use of the global flag in the first answer is a bit confusing. The regex /_(\d+)_(\d+)/ already captures two integers. Additionally the g modifier tries to match multiple times. So this is redundant.
IMHO the g modifier should be used when the number of matches is unknown or when it simplifies the regex.
As far as I see this works exactly the same way as in JavaScript.
Here are some examples:
use strict;
use warnings;
use Data::Dumper;
my $str_a = '_1_22'; # three integers seperated by an underscore
# expect two integert
# using the g modifier for global matching
my ($int1_g, $int2_g) = $str_a =~ m/_(\d+)/g;
print "global:\n", Dumper( $str_a, $int1_g, $int2_g ), "\n";
# match two ints explicitly
my ( $int1_e, $int2_e) = $str_a =~ m/_(\d+)_(\d+)/;
print "explicit:\n", Dumper( $str_a, $int1_e, $int2_e ), "\n";
# matching an unknown number of integers
my $str_b = '_1_22_333_4444';
my @ints = $str_b =~ m/_(\d+)/g;
print "multiple integers:\n", Dumper( $str_b, \@ints ), "\n";
# alternatively you can use split
my ( $int1_s, $int2_s ) = split m/_/, $str_a;
print "split:\n", Dumper( $str_a, $int1_g, $int2_g ), "\n";

- 658
- 1
- 4
- 10