How do I sort the frequency of repetition ($n) from high to low in perl

Question

I have this code. This code is functioning well to find the common lines between multiple files. It was just, I dint have any idea how to sort the output from the highest repetition to the lowest. Instead of 5,3,2,6,4,5,6 I want the files to be sorted out as 6,6,5,5,4,3,2

The Output.txt

For line --> five
This line occurs 5 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt
For line --> three
This line occurs 3 times in the following files: - 
a.txt,
b.txt,
c.txt
For line --> two
This line occurs 2 times in the following files: - 
a.txt,
b.txt
For line --> eight
This line occurs 6 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt,
f.txt
For line --> four 
This line occurs 4 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt
For line --> six
This line occurs 5 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt
For line --> seven
This line occurs 6 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt,
f.txt
The total common line between files are 7

The Script files (perl)

#!/usr/bin/perl -w
my %hash; 
my $file;
my $fh;
my $count;

for $file (@ARGV) {
    open ($fh, $file) or die "$file: $!\n";
    while(<$fh>) {
        push @{$hash{ $_}}, $file;
    } 
}
for (keys %hash) {
    $n = @{$hash{$_}};
    if(@{$hash{$_}} > 1) {
        $count ++;
        print "\n For line --> $_\n";
        print "This line occurs $n times in the following files: - \n", join(",\n", @{$hash{$_}}), "\n\n";
    }
}
print "The total common line between files are $count\n";  
exit 0;

Use `sort` on the size of the array for each key value. You can probably get away with `sort { @{ $hash{$b} } <=> @{ $hash{$a} } keys %hash`. — TLP, Aug 24 '22 at 10:47
Tips: [Prefer `use warnings;` over `-w`](https://stackoverflow.com/questions/221919/should-i-turn-on-perl-warnings-with-the-command-line-switch-or-pragma) and always `use strict;`. — Shawn, Aug 24 '22 at 10:55

Shawn · Answer 1 · 2022-08-24T14:56:42.853

0

You have to sort the list of keys instead of using the arbitrary order that keys returns. A common way in perl to efficently do so is to use a Schwartzian Transform:

for (map  { $_->[0] }
     sort { $b->[1] <=> $a->[1] }
     map  { [ $_, scalar @{$hash{$_}} ] }
     keys %hash) {
    # ...
}

edited Aug 24 '22 at 14:56

answered Aug 24 '22 at 10:51

Shawn

47,241
3
26
60

Though in this example you might want to keep the list of `key, length` tuples to re-use the length inside the loop; in that case you'd leave off the first `map`. – Shawn Aug 24 '22 at 10:57

score -1 · Accepted Answer · answered Aug 24 '22 at 15:09

You can use the following:

sort { @{ $hash{$b} } <=> @{ $hash{$a} } keys %hash

You can also use the phenomenal Sort-Key distribution.

use Sort::Key qw( rukeysort );

rukeysort { 0+@{ $hash{$_} } } keys %hash

Using a Schwartzian Transform was suggested. I don't think that's a good solution here.

Without testing, it's unclear if a Schwartzian Transform would actually improve performance here, what with all the extra call blocks and memory allocations. It's quite possible that it makes the program both more complex and slower.

In fact, it's unclear if using ST is ever a good solution. If it's worthwhile to use a ST, you're better off using Sort::Key if you can. It's both simpler and faster.

How do I sort the frequency of repetition ($n) from high to low in perl

2 Answers2