I use sort | uniq -c | sort -n
for years but today it fails as my input file is 10 GB and my /tmp
is 1 GB wide:
sort: write failed: /tmp/sortmIGbL: No space left on device
Therefore I am looking for an efficient alternative for everyday use:
awk
may be used but there is no sorted associative arrayperl
seems to be a good option but the 10-years-old solution from perlmonks.org does not seem to workno warnings; $^W=0; open my $in, $ARGV[0] or die "Couldn't open $ARGV[0]:$!"; my ($buffer, %h) = ''; keys %h = 1024*500; while (sysread($in, $buffer, 16384, length $buffer)) { $h{$1}++ while $buffer =~ m[^(?:.+?\|){9}([^|]+)\|]mg; $buffer = substr($buffer, rindex($buffer, "\n")); } print scalar keys %h;
How to get the same result as sort | uniq -c | sort -nr | head
on very large files?
- As I use Linux/Cygwin/Solaris/*BSD/... I am open to any idea (portable or not)
- You are free to use the scripting language you want (
awk
/perl
/...)
input example
a
BB
ccccc
dddddddd
a
BB
a
one of the possible outputs
3 a
2 BB
1 dddddddd
1 ccccc