6

I'have a string in a following format: _num1_num2. I need to assign num1 and num2 values to some variables. My regex is (\d+) and it shows me the correct match groups on Rubular.com, but I don't know how to assign these match groups to some variables. Can anybody help me? Thanks in adv.

kulan
  • 1,353
  • 3
  • 15
  • 29

3 Answers3

12

That should be (assuming your string is stored in '$string'):

my ($var1, $var2) = $string =~ /_(\d+)_(\d+)/s; 

The idea is to grab numbers until you get a non-number character: here '_'.

Each capturing group is then assign to their respective variable.


As mentioned in this question (and in the comments below by Kaoru):

\d can indeed match more than 10 different characters, if applied to Unicode strings.

So you can use instead:

my ($var1, $var2) = $string =~ /_([0-9]+)_([0-9]+)/s; 
Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • hey!, I don't know perl but if we *match* `\d+`, won't `num1` and `num2` matched? – Amit Joki May 31 '14 at 06:39
  • @AmitJoki not if both numbers are separated by a non-number '_'. – VonC May 31 '14 at 06:40
  • Oh, thank you, it works. Could you also explain the regex please? I want to understand so I can do it myself next time. Particulary why it doesn't return `_` and what is `gs`? – kulan May 31 '14 at 06:40
  • @VonC, so it doesn't work the way javascript does. For example `"_12_24".match(/\d+/g)` returns `["12", "24"]`. – Amit Joki May 31 '14 at 06:41
  • @AmitJoki Yes, each capturing group would only get one number here. – VonC May 31 '14 at 06:45
  • @AmitJoki you can see more at http://modernperlbooks.com/books/modern_perl/chapter_06.html#Q2FwdHVyaW5n for instance. – VonC May 31 '14 at 06:48
  • Note that the \d shorthand can be a bit dangerous when running against unicode/character strings because it will match non-digit number characters. – Kaoru May 31 '14 at 15:31
  • @Kaoru Ok, I have edited the question accordingly, and added a link to another SO question which is about the issue you mention. – VonC May 31 '14 at 15:35
3

Using the g-modifier also allows you to do away with the the grouping parenthesis:

my ($five, $sixty) = '_5_60' =~ /\d+/g;

This allows any separation of integers but it doesn't verify the input format.

Richard RP
  • 525
  • 2
  • 4
1

The use of the global flag in the first answer is a bit confusing. The regex /_(\d+)_(\d+)/ already captures two integers. Additionally the g modifier tries to match multiple times. So this is redundant.

IMHO the g modifier should be used when the number of matches is unknown or when it simplifies the regex.

As far as I see this works exactly the same way as in JavaScript.

Here are some examples:

use strict;
use warnings;

use Data::Dumper;

my $str_a = '_1_22'; # three integers seperated by an underscore

# expect two integert

# using the g modifier for global matching
my ($int1_g, $int2_g) = $str_a =~ m/_(\d+)/g;
print "global:\n", Dumper( $str_a, $int1_g, $int2_g ), "\n";

# match two ints explicitly
my ( $int1_e, $int2_e) = $str_a =~ m/_(\d+)_(\d+)/;
print "explicit:\n", Dumper( $str_a, $int1_e, $int2_e ), "\n";

# matching an unknown number of integers
my $str_b = '_1_22_333_4444';
my @ints = $str_b =~ m/_(\d+)/g;
print "multiple integers:\n", Dumper( $str_b, \@ints ), "\n";

# alternatively you can use split
my ( $int1_s, $int2_s ) = split m/_/, $str_a;
print "split:\n", Dumper( $str_a, $int1_g, $int2_g ), "\n";
BarneySchmale
  • 658
  • 1
  • 4
  • 10