1

I am banging my head over a Perl task in my Natural Language Processing course that we have been assigned to solve.

What they require us to be able to solve with Perl is the following:

  • Input: the program takes two inputs from stdin in the form and type of; perl program.pl

  • Processing and Output:

    Part 1: the program tokenizes words in filename.txt and stores these words in a hash with their frequency of occurrence

    Part 2: the program uses the input for hashing purposes. If the word cannot be found in the hash (thus in the text), prints out zero as the frequency of the word. If the word CAN indeed be found in the hash, prints out the corresponding frequency value of the word in the hash.

I am sure from experience that my script is already able to DO "Part 1" stated above.

Part 2 needs to be accomplished using a Perl sub (subroutine) which takes the hash by reference, along with the to hash for. This was the part that I had some serious trouble with.

First version before major changes Stefan Becker suggested;

#!/usr/bin/perl                                                                           

use warnings;
use strict;

sub hash_4Frequency
{
    my ($hashWord, $ref2_Hash) = @_;                       
    print $ref2_Hash -> {$hashWord}, "\n";  # thank you Stefan Becker, for sobriety
}

my %f = ();  # hash that will contain words and their frequencies                              
my $wc = 0;  # word-count                                       

my ($stdin, $word_2Hash) = @ARGV;  # corrected, thanks to Silvar

while ($stdin)
{
    while ("/\w+/")
    {
        my $w = $&;
        $_ = $";
        $f{lc $w} += 1;
        $wc++;
    }
}

my @args = ($word_2Hash, %f);
hash_4Frequency(@args);

The second version after some changes;

#!/usr/bin/perl

use warnings;
use strict;

sub hash_4Frequency
{
    my $ref2_Hash = %_;
    my $hashWord = $_;

    print $ref2_Hash -> {$hashWord}, "\n";
}

my %f = ();  # hash that will contain words and their frequencies
my $wc = 0;  # word-count

while (<STDIN>) 
{
    while (/\w+/)
    {
        chomp;
        my $w = $&;
        $_ = $";

        $f{$_}++ foreach keys %f;
        $wc++;
    }
}

hash_4Frequency($_, \%f);

When I execute ' ./script.pl < somefile.txt someWord ' in Terminal, Perl complains (Perl's output for the first version)

 Use of uninitialized value $hashWord in hash element at   
 ./word_counter2.pl line 35.

 Use of uninitialized value in print at ./word_counter2.pl line 35.

What Perl complains for the second version;

 Can't use string ("0") as a HASH ref while "strict refs" in use at ./word_counter2.pl line 13, <STDIN> line 8390.

At least now I know the script can successfully work until this very last point, and it seems something semantic rather than syntactical.

Any further advice on this last part? Would be really appreciated.

P.S.: Sorry pilgrims, I am just a novice in the path of Perl.

  • I am not sure why the " makes the rest of the code after the middle of while like that, btw. In our servers at university, that does not seem to create such a problem. And yes, I am aware that it is very very possible I have too much to go at Perl. – Ahmet Emre Harsa Feb 14 '19 at 03:06
  • Arguments to the entire script are stored in `@ARGV`, not `@_`. Getting values of `$stdin` and `$word_2Hash` from `@_` means they'll be undefined. – Silvar Feb 14 '19 at 03:06
  • okay just checking thanks – Ahmet Emre Harsa Feb 14 '19 at 03:07
  • I don't know from which source you are trying to learn Perl from, but it doesn't seem to have to do anything with Perl. It should be `hash_4Frequency($word_2Hash, \%f);` and `print $ref2_Hash->{$hashWord}, "\n";` – Stefan Becker Feb 14 '19 at 03:14
  • it is our Lecture and Lab notes frankly, maybe either they are really outdated or just trash. $ref2_Hash->{$hashWord}, "\n"; makes sense to me, but since I was working on this for a long time, I diverged into whatever really – Ahmet Emre Harsa Feb 14 '19 at 03:18
  • The input loop should probably be `while () { chomp; my @words = map { uc } /(\w+)/g; $f{$_}++ foreach @words; $wc += @words; }`. You call your script with `perl script.pl – Stefan Becker Feb 14 '19 at 03:19
  • I am working the points you have supplied me with. Will return. – Ahmet Emre Harsa Feb 14 '19 at 03:25
  • Any possible remarks on the latest version and the issue that might be of some - *ahem* very appreciated - use? @Silvar – Ahmet Emre Harsa Feb 14 '19 at 04:20
  • Couldn't refer you at above comment @StefanBecker – Ahmet Emre Harsa Feb 14 '19 at 04:22

2 Answers2

1

A quick test on the command line with this example shows one correct syntax for passing in a word and a hash reference to a function:

use strict;
use warnings;
use v5.18;
sub foo {
    my $word = $_[0];
    shift;
    my $hsh = $_[0];
    say $word; say $hsh->{$word};
};
foo("x", {"x" => 4});
# prints x and 4

This treats the argument list as an array, getting the first element and popping it off each time. Instead, I would actually suggest getting both arguments at the same time: my ($word, $hsh) = @_;

And your syntax for accessing the hash ref elements may well be correct, but I find it easier to remember the syntax which is shared between C++ and perl: an arrow means dereferencing. Plus you know you'll never accidentally copy the data structure when using the arrow syntax.

piojo
  • 6,351
  • 1
  • 26
  • 36
1

Your fixed version is not much better than your first one. Although it passes the syntax check it has several semantic errors. Here is a version with the minimum amount of fixes to make it work

NOTE: this is not how you write it in idiomatic Perl.

#!/usr/bin/perl
use warnings;
use strict;

sub hash_4Frequency($$) {
    my($ref2_Hash, $hashWord) = @_;

    print $ref2_Hash -> {$hashWord}, "\n";
}

my %f = ();  # hash that will contain words and their frequencies
my $wc = 0;  # word-count

while (<STDIN>)
{
    chomp;
    while (/(\w+)/g)
    {
        $f{$1}++;
        $wc++;
    }
}

hash_4Frequency(\%f, $ARGV[0]);

Test output with "Lorem ipsum" as input text:

$ cat dummy.txt 
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor
incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat.
Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.

$ perl <dummy.txt dummy.pl Lorem
1

BONUS CODE: this would be my first stab the given problem. Your first version lower-cased all words, which does makes sense, so I kept it:

#!/usr/bin/perl
use warnings;
use strict;

sub word_frequency($$) {
    my($hash_ref, $word) = @_;

    print "The word '${word}' appears ", $hash_ref->{$word} // 0, " time(s) in the input text.\n";
}

my %words;  # hash that will contain words and their frequencies
my $wc = 0; # word-count

while (<STDIN>) {
    # lower case all words
    $wc += map { $words{lc($_)}++ } /(\w+)/g
}

print "Input text has ${wc} words in total, of which ",
      scalar(keys %words),
      " are unique.\n";

# return frequency in input text for every word on the command line
foreach my $word (@ARGV) {
    word_frequency(\%words, lc($word));
}

exit 0;

Test run

$ perl <dummy.txt dummy.pl Lorem ipsum dolor in test
Input text has 66 words in total, of which 61 are unique.
The word 'lorem' appears 1 time(s) in the input text.
The word 'ipsum' appears 1 time(s) in the input text.
The word 'dolor' appears 1 time(s) in the input text.
The word 'in' appears 2 time(s) in the input text.
The word 'test' appears 0 time(s) in the input text.
Stefan Becker
  • 5,695
  • 9
  • 20
  • 30
  • Please don't encourage beginners to the language to use prototypes. [Why are Perl 5's function prototypes bad?](https://stackoverflow.com/questions/297034/why-are-perl-5s-function-prototypes-bad) – Dave Cross Feb 16 '19 at 12:41