Perl parse String with one or more fields

Question

I have a string I need to parse. It meets the following requirements:

It is comprised of 0 or more key->value pairs.
The key is always 2 letters.
The value is one or more numbers.
There will not be a space between the key and value.
There may or may not be a space between individual pairs.

Example strings I may see:

AB1234 //One key->value pair (Key=AB, Value=1234)
AB1234 BC2345 //Two key->value pairs, separated by space
AB1234BC2345 //Two key->value pairs, not separated by space
//Empty Sting, No key->value pairs
AB12345601BC1234CD1232PE2343 //Lots of key->value pairs, no space
AB12345601 BC1234 CD1232 PE2343 //Lots of key->value pairs, with spaces

I need to build a Perl hash of this string. If I could guarantee it was 1 pair I would do something like this:

$string =~ /([A-Z][A-Z])([0-9]+)/
$key = $1
$value = $2
$hash{$key} = $value

For multiple strings, I could potentially do something where after each match of the above regex, I take a substring of the original string (exempting the first match) and then search again. However, I'm sure there's a more clever, perl-esque way to achieve this.

Wishing I didn't have such a crappy data source to deal with-

Jonathan

See also [How can I store regex captures in an array in Perl?](http://stackoverflow.com/questions/2304577/). — outis, Nov 25 '11 at 23:09

outis · Accepted Answer · 2011-11-25T23:09:00.137

In a list context with the global flag, a regex will return all matched substrings:

use Data::Dumper;

@strs = (
    'AB1234',
    'AB1234 BC2345',
    'AB1234BC2345',
    '',
    'AB12345601BC1234CD1232PE2343',
    'AB12345601 BC1234 CD1232 PE2343'
);

for $str (@strs) {
    # The money line
    %parts = ($str =~ /([A-Z][A-Z])(\d+)/g);

    print Dumper(\%parts);
}

For greater opacity, remove the parentheses around the pattern matching: %parts = $str =~ /([A-Z][A-Z])(\d+)/g;.

score 3 · Answer 2 · answered Nov 25 '11 at 23:05

3

You are already there:

$hash{$1} = $2 while $string =~ /([[:alpha:]]{2})([0-9]+)/g

answered Nov 25 '11 at 23:05

choroba

231,213
25
204
289

score 0 · Answer 3 · answered Nov 25 '11 at 23:09

0

Assuming your strings are definitely going to match your scheme (i.e. there won't be any strings of the form A122 or ABC123), then this should work:

my @strings = ( 'AB1234', 'AB1234 BC2345', 'AB1234BC2345' );

foreach my $string (@strings) {
    $string =~ s/\s+//g;
    my ( $first, %elems ) = split(/([A-Z]{2})/, $string);
    while (my ($key,$value) = each %elems) {
        delete $elems{$key} unless $key =~ /^[A-Z]{2}$/;
        delete $elems{$key} unless $value =~ /^\d{4}$/;
    }
    print Dumper \%elems;
}

answered Nov 25 '11 at 23:09

CanSpice

34,814
10
72
86

The pure regex answers look a little cleaner. I was just trying something different with `split`. :-) – CanSpice Nov 25 '11 at 23:10
If it all comes in one string you could do something like `$string =~ s/\s+//g; my %h = map{split/(?<=\D)(?=\d)/}split/(?<=\d)(?=\D)/, $string;` – flesk Nov 25 '11 at 23:30
Or simply `%h = split /\s*(\d+)\s*/, $string` – TLP Nov 26 '11 at 02:02

Perl parse String with one or more fields

3 Answers3