-1

I have a db of places people have ordered items from. I parsed the list to get the city and state so it prints like this - city, state (New York, NY) etc....

I use the variables $city and $state but I want to count how many times each city and state occur so it looks like this - city, state, count (Seattle, WA 8)

I have all of it working except the count .. I am using a hash but I can't figure out what is wrong with this hash:

if ($varc==3) {
        $line =~ /(?:\>)(\w+.*)(?:\<)/;
        $city = $1;
    }
    if ($vars==5) {
        $line =~ /(?:\>)((\w+.*))(?:\<)/;
        $state = $1;

        # foreach $count (keys %counts){
        #   $counts = {$city, $state} {$count}++;
        #   print $counts;
        # }

    print "$city, $state\n";

    }

foreach $count (keys %counts){
$counts = {$city, $state} {$count}++;
print $counts;
}
R.Stone
  • 15
  • 1
  • 4

2 Answers2

1

Instead of printing city and state you can build a "location" string with both items and use the following counting code:

# Declare this variable before starting to parse the locations.
my %counts = ();

# Inside of the loop that parses the city and state, let's assume
# that you've got $city and $state already...

  my $location = "$city, $state";
  $counts{$location} += 1;
}

# When you've processed all locations then the counts will be correct.
foreach $location (keys %counts) {
  print "OK: $location => $counts{$location}\n";
}

# OK: New York, NY => 5
# OK: Albuquerque, NM => 1
# OK: Los Angeles, CA => 2
maerics
  • 151,642
  • 46
  • 269
  • 291
  • I don't understand how you have [at]places = ( followed by the city and state. I have code that when runs prints city and state but not with the [at]places and in this type of list so I have no idea how to get a list formatted after the ( in the [at]places... unless I copy paste all this and manually make the list but that seems to be completely defeating the purpose – R.Stone Nov 21 '17 at 15:27
  • You say that you "parsed the list to get the city and state" so it sounds like you've already got a list containing those parsed items in an array. If so, then all you need to do is the rest of the code afterwards. – maerics Nov 21 '17 at 15:31
  • Hmmm...so when I run the full code that is listed above and it hits the print $city, $state line (without the hash) it returns the following in my cmd window: city, state city, state city, state with a \n between each pair ... so does that mean this is an array? I am sorry but I just don't understand. I thought I had made it through the hard part by get through the jumbled mess of information and pulling out the city, state.... all I need is the count but this seems to be way more difficult than I thought it would be. – R.Stone Nov 21 '17 at 15:34
  • Can I create this by doing the following: [at]places=($city, $state); ? – R.Stone Nov 21 '17 at 15:39
  • Ah ok, I just saw your updated question. If you've got city and state in separate variables then you can just build a string containing them both and use similar code as above. I'l update my answer.... – maerics Nov 21 '17 at 15:39
  • Thank you, I will try this out. – R.Stone Nov 21 '17 at 15:47
  • Ok, it's not throwing an error but it isn't working properly.... instead of returning city, state, number ... it is returning city, state =>1 for each line with the city, state being correct but the number is =>1 for all. – R.Stone Nov 21 '17 at 15:54
  • Ah, ok I figured out what the hangup was. It is working, thank you!!!! – R.Stone Nov 21 '17 at 15:56
  • Huh, I ended up reproducing your answer in different words because I didn't see your edit. – simbabque Nov 21 '17 at 16:01
1

This is going to be a mix of an answer and a code review. I will start with a warning though.

You are trying to parse what looks like XML with Regular Expressions. While this can be done, it should probably not be done. Use an existing parser instead.

How do I know? Stuff that is between angle brackets looks like the format is XML, unless you have a very weird CSV file.

#             V            V
$line =~ /(?:\>)(\w+.*)(?:\<)/;

Also note that you don't need to escape < and >, they have no special meaning in regex.


Now to your code.

First, make sure you always use strict and use warnings, so you are aware of stuff that goes wrong. I can tell you're not because the $count in your loop has no my.

What's $vars (with an s), and what's $varc (with a c). I am guessing that has to do with the state and the city. Is it the column number? In an XML file? Huh.

$line =~ /(?:\>)((\w+.*))(?:\<)/;

Why are there two capture groups, both capturing the same thing?


Anyway, you want to count how often each combination of state and city occurs.

foreach $count (keys %counts){
$counts = {$city, $state} {$count}++;
print $counts;
}

Have you run this code? Even without strict, it gives a syntax error. I'm not even sure what it's supposed to do, so I can't tell you how to fix it.

To implement counting, you need a hash. You got that part right. But you need to declare that hash variable outside of your file reading loop. Then you need to create a key for your city and state combination in the hash, and increment it every time that combination is seen.

my %counts;    # declare outside the loop
while ( my $line = <$fh> ) {
    chomp $line;
    if ( $varc == 3 ) {
        $line =~ /(?:\>)(\w+.*)(?:\<)/;
        $city = $1;
    }
    if ( $vars == 5 ) {
        $line =~ /(?:\>)((\w+.*))(?:\<)/;
        $state = $1;

        print "$city, $state\n";

        $count{"$city, $state"}++;    # increment when seen
    }
}

You have to parse the whole file before you can know how often each combination is in the file. So if you want to print those together, you will have to move the printing outside of the loop that reads the file, and iterate the %count hash by keys at a later point.

my %counts;    # declare outside the loop
while ( my $line = <$fh> ) {
    chomp $line;
    if ( $varc == 3 ) {
        $line =~ /(?:\>)(\w+.*)(?:\<)/;
        $city = $1;
    }
    if ( $vars == 5 ) {
        $line =~ /(?:\>)((\w+.*))(?:\<)/;
        $state = $1;

        $count{"$city, $state"}++;    # increment when seen
    }
}

# iterate again to print final counts
foreach my $item ( sort keys %counts ) {
    print "$item $counts{$item}\n";
}
simbabque
  • 53,749
  • 8
  • 73
  • 136
  • I think I'd go with `$count{$city}{$state}++` instead of `$count{"$city, $state"}++`. It's not mentioned in the question, but I bet at some point he's going to want to report on cities by state. – Dave Cross Nov 21 '17 at 17:08
  • 1
    @DaveCross I was going to mention that, but it was already long. And then I forgot. :D – simbabque Nov 21 '17 at 17:10