This is going to be a mix of an answer and a code review. I will start with a warning though.
You are trying to parse what looks like XML with Regular Expressions. While this can be done, it should probably not be done. Use an existing parser instead.
How do I know? Stuff that is between angle brackets looks like the format is XML, unless you have a very weird CSV file.
# V V
$line =~ /(?:\>)(\w+.*)(?:\<)/;
Also note that you don't need to escape <
and >
, they have no special meaning in regex.
Now to your code.
First, make sure you always use strict
and use warnings
, so you are aware of stuff that goes wrong. I can tell you're not because the $count
in your loop has no my
.
What's $vars
(with an s
), and what's $varc
(with a c
). I am guessing that has to do with the state and the city. Is it the column number? In an XML file? Huh.
$line =~ /(?:\>)((\w+.*))(?:\<)/;
Why are there two capture groups, both capturing the same thing?
Anyway, you want to count how often each combination of state and city occurs.
foreach $count (keys %counts){
$counts = {$city, $state} {$count}++;
print $counts;
}
Have you run this code? Even without strict
, it gives a syntax error. I'm not even sure what it's supposed to do, so I can't tell you how to fix it.
To implement counting, you need a hash. You got that part right. But you need to declare that hash variable outside of your file reading loop. Then you need to create a key for your city and state combination in the hash, and increment it every time that combination is seen.
my %counts; # declare outside the loop
while ( my $line = <$fh> ) {
chomp $line;
if ( $varc == 3 ) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ( $vars == 5 ) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
print "$city, $state\n";
$count{"$city, $state"}++; # increment when seen
}
}
You have to parse the whole file before you can know how often each combination is in the file. So if you want to print those together, you will have to move the printing outside of the loop that reads the file, and iterate the %count
hash by keys at a later point.
my %counts; # declare outside the loop
while ( my $line = <$fh> ) {
chomp $line;
if ( $varc == 3 ) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ( $vars == 5 ) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
$count{"$city, $state"}++; # increment when seen
}
}
# iterate again to print final counts
foreach my $item ( sort keys %counts ) {
print "$item $counts{$item}\n";
}