1

I have been looking at an old post about sorting an array by using a regular expression in Perl. The original post is here

I am struggling to totally understand the script that was voted as the ‘correct’ answer. The original post was regarding sorting the array below:

  my @array = (
  "2014 Computer Monitor 200",
  "2010 Keyboard 30",
  "2012 Keyboard 80",
  "2011 Study Desk 100"
);

The question was how to use regular expressions in perl to sort the entire array by year, item name, and price? For example, if the user wants to sort by price they type 'price' and it sorts like this:

2010 Keyboard 30
2012 Keyboard 80
2011 Study Desk 100
2014 Computer Monitor 200

A solution was proposed, that uses a Schwartzian transform. I have just started to learn about this, and this script is a little different to the other examples I've seen. The script that was voted as the correct answer is below. I am looking for advice on how it works.

   my $order = "price";
   my @array = (
  "2014 Computer Monitor 200",
  "2010 Keyboard 30",
  "2012 Keyboard 80",
  "2011 Study Desk 100"
);

my %sort_by = (
  year  => sub { $a->{year}  <=> $b->{year} },
  price => sub { $a->{price} <=> $b->{price} },
  name  => sub { $a->{name}  cmp $b->{name} },
);
@array = sort {

  local ($a, $b) = map {
    my %h; 
    @h{qw(year name price)} = /(\d+) \s+ (.+) \s+ (\S+)/x;
    \%h;
  } ($a, $b);
  $sort_by{$order}->();

} @array;

# S. transform
# @array =
#  map { $_->{line} }
#  sort { $sort_by{$order}->() }
#  map { 
#    my %h = (line => $_); 
#    @h{qw(year name price)} = /(\d+) \s+ (.+) \s+ (\S+)/x;
#    $h{name} ? \%h : ();
#  } @array;

use Data::Dumper; print Dumper \@array;

I know the script is using the regular expression /(\d+) \s+ (.+) \s+ (\S+)/x to match on year name and price.

I think the rest of the script works as below:

• The initial sort on line 14 takes in items from @array two at a time, one in $a and one in $b

• The map function then takes items $a and $b and maps each to a hash - each item becomes a hash with keys 'year', 'price', and 'name. This is based on the regex /(\d+) \s+ (.+) \s+ (\S+)/x

• Map returns the two hashes, as references, to local variables $a and $b

• I think it is necessary to use local $a and $b otherwise sort will use the default $a and $b taken in at the start of the sort on line 17?

• The 'price' sort function is stored as an coderef in the %sort_by hash

• This is called at line 26 by the code $sort_by{$order}->() on the local versions of $a and $b

This repeated until all items are returned to @array in line 14

Please can anyone tell me if I'm on the right lines here, or correct any misunderstandings. Also can you advise on the use of the local $a and $b variables.

thanks J

John D
  • 311
  • 2
  • 7
  • For `local` see [here](https://stackoverflow.com/questions/129607/what-is-the-difference-between-my-and-local-in-perl). – ceving Jun 03 '21 at 11:30

1 Answers1

1

A Schwartzian transform is a way to avoid computing the sorting keys too many times, like in the solution - the one with the local ($a,$b)

The steps of a S. tranform are basically:

  • use a Map to enrich the list elements with computed sorted keys. Here, %h is used as the new element, containing the original line as line
  • use a Sort to sort this rich list. The sort with a bit of dirty $a $b magic.
  • use a Map to extract the original list elements. Here by extracting the line key.

A note on $a $b

Very sadly, $a and $b are global variables in Perl. They usually get automagically assigned inside a sort block. Like in sort { $a <=> $b } (3,2,1)

This explains why the S. solution works even though the compared elements are not given as arguments to the sorting subs. And it also explains the need for local (another Perl horror to pretend a global variable is local) so the naive solution's sort function get the right values in $a, $b.

I strongly encourage you to forget about this and avoid implicit use of $a , $b deeper than the sort block itself.

A slightly more understandable version would be:

my $order = "price";
my @array = (
  "2014 Computer Monitor 200",
  "2010 Keyboard 30",
  "2012 Keyboard 80",
  "2011 Study Desk 100"
);

my %sort_by = (
  year  => sub { shift->{year}  <=> shift->{year} },
  price => sub { shift->{price} <=> shift->{price} },
  name  => sub { shift->{name}  cmp shift->{name} },
);

my @sorted = 
  map { $_->{line} }
  sort { $sort_by{$order}->($a, $b) }
  map { 
    my %h = (line => $_); # $_ is the array element (the input line)
    @h{qw(year name price)} = ( $_ =~ /(\d+) \s+ (.+) \s+ (\S+)/x );
    # Did the regex capture a name, i.e. did it work?
    if( $h{name} ){
        \%h
    } else{
        (); # Empty array will cause the invalid line to disappear, but you can choose to do something else with it.
    }
  } @array;
  
print(join("\n", @sorted))
jeje
  • 3,191
  • 3
  • 26
  • 41
  • Thanks for your alternative script. I find it much more understandable, and in keepng with what I'm reading about Schwartzian transform. – John D Jun 04 '21 at 08:44
  • I think I was on the right lines on my analysis of how the original solution script works... – John D Jun 04 '21 at 08:46