In short, extract that index (1
or 2
...) from a line of data and use it to form the right key and add the line to that key's arraref, also relying on autovivification
push @{$data{"state_$1"}}, $line if $line =~ /input_([0-9])/;
On lines with data input_NN
the regex extracts the first number after input_
, and adds that line to the arrayref at its suitable key. The key name to add to is built using the captured number and the fixed prefix. The very first time round for a key, before it's ever been seen and so it isn't in the hash yet, it is made by the mechanism/feature called autovivification.†
Then there are a few details -- if what is there in real data, instead of the token names input
and state
, isn't known ahead of time then it can be extracted from the first line of data, and then used as above. The linefeed need be removed as the line is read. Altogether
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd); # to see complex data; or, use core Data::Dumper
my %data;
while (<>) { # reads line by line the files given on command line
chomp;
push @{$data{"state_$1"}}, $_ if /input_([0-9])/;
}
dd \%data;
This assumes that the keys/data have fixed prefixes and that those are known (state
, input
).
A one-liner example
perl -MData::Dump -wnlE'push @{$h{"state_$1"}}, $_ if /input_([0-9])/; }{ dd \%h' file
where file
contains what is given in the question. That }{
starts the END
phase -- code after that is executed once all lines have been read and processing completed, the RUN
phase finished and the program is about to exit.
† This is a feature whereby a needed object is created when an undefined value (where the object would be) is dereferenced in an "lvalue context" (where it need be "modifiable").
A specific example above: we "use" (dereference) a key state_1
(etc) by adding the the arrayref that is its value, push @{ $data{state_1} }, $value
-- but there is no such key there the first time! Well, it's made for us on the fly.
See for instance an article from The Effective Perler, and then there is far more of it around. Here are some generic examples and discussion, and here is a trickier example of when it kicks in or not.
Entry from perlglossary
In Perl, storage locations (lvalues) spontaneously generate themselves as needed, including the creation of any hard reference values to point to the next level of storage. The assignment $a[5][5][5][5][5] = "quintet"
potentially creates five scalar storage locations, plus four references (in the first four scalar locations) pointing to four new anonymous arrays (to hold the last four scalar locations). But the point of autovivification is that you don’t have to worry about it.