How do I group values from a file into a hash according to specific Regular Expressions in Perl?

Question

I have a .txt file with data in one column like so:

state_1
state_2
state_3
input_11
input_12
input_13
input_21
input_22
input_31

And I want to group these tags into a hash according to their names and numbers:

state_1:              # For state_1, we have all "input1x" data tags
  -input_11
  -input_12
  -input_13
state_2:              # For state_2, we have all "input2x" data tags
  -input_21
  -input_22
state_3:              # For state_3, we have all "input3x" data tags
  -input_31

I tried using the push function like in this case to try and force the values into the hashes as arrays with push @{ $state{$inputs} }, $_;, but I am stuck finding a way to loop and get the desired hash according to the regular expressions.

I have also researched how to group my data according to regex, but I still cannot find a way to group the data as I get it from the .txt file.

My question is, what am I looking for to be able to group these tags accordingly?

The shown output looks like YAML -- is that on purpose, to output YAML? — zdim, Jan 30 '22 at 19:16
@zdim The program I am trying to make will rely on other coding to print out the desired data from the hashes I create. I use the YAML package since it is easier to read, but not as the final ouput. — Matthias, Jan 30 '22 at 19:24
Thanks, alright then I won't add that to my post -- which I've edited further btw. — zdim, Jan 30 '22 at 19:47

zdim · Accepted Answer · 2022-01-31T17:47:35.323

In short, extract that index (1 or 2...) from a line of data and use it to form the right key and add the line to that key's arraref, also relying on autovivification

push @{$data{"state_$1"}}, $line  if $line =~ /input_([0-9])/;

On lines with data input_NN the regex extracts the first number after input_, and adds that line to the arrayref at its suitable key. The key name to add to is built using the captured number and the fixed prefix. The very first time round for a key, before it's ever been seen and so it isn't in the hash yet, it is made by the mechanism/feature called autovivification.^†

Then there are a few details -- if what is there in real data, instead of the token names input and state, isn't known ahead of time then it can be extracted from the first line of data, and then used as above. The linefeed need be removed as the line is read. Altogether

use warnings;
use strict;
use feature 'say';

use Data::Dump qw(dd);   # to see complex data; or, use core Data::Dumper

my %data;

while (<>) {  # reads line by line the files given on command line
    chomp;

    push @{$data{"state_$1"}}, $_  if /input_([0-9])/;
}

dd \%data;

This assumes that the keys/data have fixed prefixes and that those are known (state, input).

A one-liner example

perl -MData::Dump -wnlE'push @{$h{"state_$1"}}, $_ if /input_([0-9])/; }{ dd \%h' file

where file contains what is given in the question. That }{ starts the END phase -- code after that is executed once all lines have been read and processing completed, the RUN phase finished and the program is about to exit.

^† This is a feature whereby a needed object is created when an undefined value (where the object would be) is dereferenced in an "lvalue context" (where it need be "modifiable").

A specific example above: we "use" (dereference) a key state_1 (etc) by adding the the arrayref that is its value, push @{ $data{state_1} }, $value -- but there is no such key there the first time! Well, it's made for us on the fly.

See for instance an article from The Effective Perler, and then there is far more of it around. Here are some generic examples and discussion, and here is a trickier example of when it kicks in or not.

Entry from perlglossary

In Perl, storage locations (lvalues) spontaneously generate themselves as needed, including the creation of any hard reference values to point to the next level of storage. The assignment $a[5][5][5][5][5] = "quintet" potentially creates five scalar storage locations, plus four references (in the first four scalar locations) pointing to four new anonymous arrays (to hold the last four scalar locations). But the point of autovivification is that you don’t have to worry about it.

@Matthias Yes, Perl is very expressive and powerful -- that's just one line of code! (For the processing itself.) Let me know if more questions pop up — zdim, Jan 30 '22 at 20:20

How do I group values from a file into a hash according to specific Regular Expressions in Perl?

1 Answers1