2

I have the following code that in the sort comparator, it removes a prefix string before doing the comparison.

print for sort { 
   $a =~ s/^STRING//;  
   $b =~ s/^STRING//;  
   foo($a) cmp foo($b)     
} @a;  

Although the comparison and order is correct, the prefix string is being removed from the output.
The following keeps the prefix string (as I wanted).

print for sort { 
   $x = a;  
   $y = b;  
   $x =~ s/^STRING//;  
   $y =~ s/^STRING//;  
   foo($x) cmp foo($y)     
} @a;    

But I am confused how the second part keeps the prefix.
Is it doing a copy of the string and in the case of the array strips the original reference?
Also am I doing something wrong in the first snippet and end up with the issue?

Jim
  • 3,845
  • 3
  • 22
  • 47
  • You can define `$a` and `$b` as lexicals (with `my`) and remove STRING inside `foo`. Since they are lexicals, they will stay just the way the are. You can even [memoize foo](https://perldoc.perl.org/Memoize.html) so that it gets cached. – jjmerelo May 03 '18 at 07:48
  • 1
    `$a` and `$b` are package global variables, and they are used by `sort` internally so if you modify them it will affect the search result as you observed. You should treat them as read only, and follow the suggestion given by @jjmerelo in his comment above – Håkon Hægland May 03 '18 at 07:57
  • @jjmerelo: Could you elaborate with some details please? – Jim May 03 '18 at 08:50
  • @Jim elaborated [in a full answer](https://stackoverflow.com/a/50151055/891440) – jjmerelo May 03 '18 at 09:06

2 Answers2

7

One way: use the /r modifier

print for sort {
    foo( $a =~ s/^STRING//r ) cmp foo( $b =~ s/^STRING//r )
} @ary

with which the substitution operator s/ returns the modified string and leaves the original unchanged. If there is no match the original string is returned what seems to fit the purpose. See the end for an optimization if this is used on large arrays, or if the function call takes time.

Another way would be to change the substitution to match-and-capture, when feasible.


There is an extensive discussion of this in sort. First, $a and $b are (package) globals

The $a and $b are set as package globals in the package the sort() is called from

The block { } stands for an anonymous subroutine and

... the elements to be compared are passed into the subroutine as the package global variables $a and $b

so the aliasing implies that changing them affects the elements. Thus

The values to be compared are always passed by reference and should not be modified.

what does happen when $a, $b get changed so the elements change.

In the second case you copy $a and $b into (what should be!) lexicals $x and $y and the connection with @ary is broken so the elements aren't changed.

Please always have use warnings; and use strict; at the beginning of the program. This is an excellent, if extreme, example -- it could matter a lot whether the variables you introduce to try things out are global ($x) or lexical (my $x).


Code that processes elements to use the resulting value for sort comparison has an efficiency flaw. The comparison is done on two elements at a time, so elements get processed multiple times. And each time we do the same processing and compute the same value for an element.

This inefficiency is notable only for large enough data sets and most of the time one need not worry. But in this case a regex engine runs and there is also a function call involved, and those aren't exactly cheap in Perl. Also, the call's overhead is unspecified and I imagine that there is some work involved.

To optimize this one can pre-process the input and then sort

my @sorted_ary =  
    map { $_->[1] }
    sort { $a->[0] cmp $b->[0] }
    map { [ foo( s/^STRING//r ), $_ ] }
    @ary;

The map that takes the input @ary applies the regex and the function call and stores that result, along with the original element, in a two-element arrayref, for each element of @ary. This list of arrayrefs is then sorted, using the first arrayref element for comparison. The last map extracts the second element from sorted arrayrefs, thus returning the original items sorted as needed.

This is called Schwartzian transform. See "Examples" in sort, for example.

One should keep in mind that the gains are notable only for large enough data while this maneuver comes with an overhead as well (and in a far more complex code). So consider using it only when there is a demonstrable problem of this kind with sorting.

zdim
  • 64,580
  • 5
  • 52
  • 81
1

Please consider this code, which elaborates on my previous comment

use strict;
use warnings;

use v5.14;

my @array=qw(a b c d foo bar baz); #This creates the array
my @prefixed = map { "STRING$_"} @array;

#This is the sorting part we're interested in
say for sort { sans_prefix($a) cmp sans_prefix($b) } @prefixed;

# This sub does the extraction
sub sans_prefix {
  my ($no_prefix) = ($_[0] =~ /STRING(\w+)/);
  return $no_prefix;
}

The logic that extracts the part that is not prefix is stashed away in sans_prefix. We're doing no substitution, but anyway we're acting on a lexical ($_[0]) and not modifying it, so this returns

STRINGa
STRINGb
STRINGbar
STRINGbaz
STRINGc
STRINGd
STRINGfoo
jjmerelo
  • 22,578
  • 8
  • 40
  • 86