Similar to question unix - count occurrences of character per line/field but for every character in every position on the line.
Given a file of ~500-characters per 1e7 lines, I want a two dimensional summary structure like $summary{'a','b','c','0','1','2'}[pos 0..499] = count_integer that shows the number of times each character was used in each position of the line. Either order of dimensions is fine.
My first method did ++summary{char}[pos] while reading, but since many lines are the same, it was much faster to count identical lines first, then summarize summary{char}[pos] += n at a time
Are there more idiomatic or faster ways than the following C-like 2d loop?
#!perl
my ( %summary, %counthash ); # perl 5.8.9
sub method1 {
print "method1\n";
while (<DATA>) {
my @c = split( // , $_ );
++$summary{ $c[$_] }[$_] foreach ( 0 .. $#c );
} # wend
} ## end sub method1
sub method2 {
print "method2\n";
++$counthash{$_} while (<DATA>); # slurpsum the whole file
foreach my $str ( keys %counthash ) {
my $n = $counthash{$str};
my @c = split(//, $str);
$summary{ $c[$_] }[$_] += $n foreach ( 0 .. $#c );
} #rof my $str
} ## end sub method2
# MAINLINE
if (rand() > 0.5) { &method1 } else { &method2 }
print "char $_ : @{$summary{$_}} \n" foreach ( 'a', 'b' );
# both methods have this output summary
# char a : 3 3 2 2 3
# char b : 2 2 3 3 2
__DATA__
aaaaa
bbbbb
aabba
bbbbb
aaaaa