1

I have tab delimited data with multiple columns.

I have OS names in column 31 and data bytes in columns 6 and 7. What I want to do is count the total volume of each unique OS.

So, I did something in Perl like this:

#!/usr/bin/perl
use warnings;

my @hhfilelist  = glob "*.txt";
my %count = ();

for my $f (@hhfilelist) {
    open F, $f || die "Cannot open $f: $!";
    while (<F>) {
        chomp;
        my @line = split /\t/;
        # counting volumes in col 6 and 7 for 31
        $count{$line[30]} = $line[5] + $line[6];     
    }
    close (F);
}

my $w = 0;

foreach $w (sort keys %count) {
    print "$w\t$count{$w}\n";
}

So, the result would be something like

Windows    100000
Linux        5000
Mac OSX     15000
Android      2000

But there seems to be some error in this code because the resulting values I get aren't as expected.

What am I doing wrong?

DVK
  • 126,886
  • 32
  • 213
  • 327
sfactor
  • 12,592
  • 32
  • 102
  • 152
  • I'd consider up-voting your question (on style, not the bug itself) if you actually provided sample input causing the problem, actual output and expected output – DVK Feb 17 '11 at 23:26

3 Answers3

6

It seems like you're not actually ADDING up the counts - you overwrite the last count for any OS with the count from the last line for that OS.

$count{$line[30]} = $line[5] + $line[6];

Should be

$count{$line[30]} += $line[5] + $line[6];

As additional considerations that can improve your code overall but don't affect the correctness of it:

  1. Please use 3-argument form of open and Lexical filehandles:

     open(my $filehandle, "<", $f) || die "Cannot open $f: $!";
    
  2. If you're 100% sure you file doesn't contain quoted field values or tabs in the field contents, your split based logic is OK. For really complicated X-separated files, I would strongly recommend using Text::CSV_XS/Text::CSV CPAN module

  3. Don't need to initialize %count or $w variables - hash will get autoinitialized to empty hash, and $w gets assigned to as loop variable - you may want to actually declare it in the loop itself: foreach my $w (sort keys %count) {

  4. Please don't use 1-letter variables. $w is meaningless in the last loop, whereas $os_name is clear.

DVK
  • 126,886
  • 32
  • 213
  • 327
  • well that was embarrassing. such an elementary mistake :). thanks for the advice. – sfactor Feb 17 '11 at 23:24
  • 1
    @sfactor - if that's the most embarassing bug you ever make, consider yourself lucky :) I had much worse – DVK Feb 17 '11 at 23:25
3

Your expression

open F, $f || die "Cannot open $f: $!";

has a subtle bug in it that will eventually bite you, though probably not today.

The || operator has higher precedence than the comma-operator to its left and so this expression actually gets parsed as

open F, ($f || die "Cannot open $f: $!")

which is to say, you will die when $f has a false (0, "", or undef) value, not when the open statement fails to open the file with the name given by $f.

To do what you mean, you could either use parentheses:

open (F, $f) || die ...

or use the alternate low-precedence or operator

open F, $f  or  die ...

(At times I have been bitten by this myself)

Community
  • 1
  • 1
mob
  • 117,087
  • 18
  • 149
  • 283
2
$count{$line[30]} = $line[5] + $line[6];

should use the += operator to add the row's sum to the total, rather than set it as the total:

$count{$line[30]} += $line[5] + $line[6];
SomeRandomGuy
  • 486
  • 2
  • 6