-1

This is my current script to try and compare the words in file_all.txt to the ones in file2.txt. It should print out any of the words in file_all that are not in file2.

I need to format these as one word per line, but that's not the more pressing issue.

I am new to Perl ... I get C and Python more but this is being a bit tricky, I know my variable assignment is off.

 use strict;
 use warnings;

 my $file2 = "file_all.txt";   %I know my assignment here is wrong
 my $file1 = "file2.txt";

 open my $file2, '<', 'file2' or die "Couldn't open file2: $!";
 while ( my $line = <$file2> ) {
     ++$file2{$line};
     }

 open my $file1, '<', 'file1' or die "Couldn't open file1: $!";
 while ( my $line = <$file1> ) {
     print $line unless $file2{$line};
     }

EDIT: OH, it should ignore case... like Pie is the same as PIE when comparing. and remove apostrophes

These are the errors I am getting:

"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
user3295674
  • 893
  • 5
  • 19
  • 42
  • 1
    Looks like you're on the right track. What is the problem? – mob Apr 28 '15 at 19:15
  • If I try to run it I get Odd number of elements in hash assignment at absent.pl line 6. Odd number of elements in hash assignment at absent.pl line 7. Couldn't open file2: No such file or directory at absent.pl line 9. – user3295674 Apr 28 '15 at 19:20
  • I am not exactly sure what that means since I'm new to perl (sorry if it's a silly question!) – user3295674 Apr 28 '15 at 19:20
  • It seems to me you are overstating your command of C and Python. – Sinan Ünür Apr 28 '15 at 19:56

4 Answers4

1

You are almost there.

The % sigil denotes a hash. You can't store a file name in a hash, you need a scalar for that.

my $file2 = 'file_all.txt';
my $file1 = 'file2.txt';

You need a hash to count the occurrences.

my %count;

To open a file, specify its name - it's stored in the scalar, do you remember?

open my $FH, '<', $file2 or die "Can't open $file2: $!";

Then, process the file line by line:

while (my $line = <$FH> ) {
    chomp;                # Remove newline if present.
    ++$count{lc $line};   # Store the lowercased string.
}

Then, open the second file, process it line by line, use lc again to get the lowercased string.

To remove apostophes, use a substitution:

$line =~ s/'//g;  # Replace ' by nothing globally (i.e. everywhere).
choroba
  • 231,213
  • 25
  • 204
  • 289
  • I tried solving it a bit here, would the second one also be $FH or something else? http://www.codeshare.io/GFhX1 (sorry, I feel like a two year old with perl) – user3295674 Apr 28 '15 at 19:48
  • @user3295674: You can use the same file handle if you aren't reading from the files in parallel. – choroba Apr 28 '15 at 19:55
1

Your error messages:

"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.

You are assigning a file name to $file2, and then later you are using open my $file2 ... The use of my $file2 in the second case masks the use in the first case. Then, in the body of the while loop, you pretend there is a hash table %file2, but you haven't declared it at all.

You should use more descriptive variable names to avoid conceptual confusion.

For example:

 my @filenames = qw(file_all.txt file2.txt);

Using variables with integer suffixes is a code smell.

Then, factor common tasks to subroutines. In this case, what you need are: 1) A function that takes a filename and returns a table of words in that file, and 2) A function that takes a filename, and a lookup table, and prints words that are in the file, but do not appear in the lookup table.

#!/usr/bin/env perl

use strict;
use warnings;

use Carp qw( croak );

my @filenames = qw(file_all.txt file2.txt);

print "$_\n" for @{ words_notseen(
    $filenames[0],
    words_from_file($filenames[1])
)};

sub words_from_file {
    my $filename = shift;
    my %words;

    open my $fh, '<', $filename
        or croak "Cannot open '$filename': $!";

    while (my $line = <$fh>) {
        $words{ lc $_ } = 1 for split ' ', $line;
    }

    close $fh
        or croak "Failed to close '$filename': $!";

    return \%words;
}

sub words_notseen {
    my $filename = shift;
    my $lookup = shift;

    my %words;

    open my $fh, '<', $filename
        or croak "Cannot open '$filename': $!";

    while (my $line = <$fh>) {
        for my $word (split ' ', $line) {
            unless (exists $lookup->{$word}) {
                $words{ $word } = 1;
            }
        }
    }

    return [ keys %words ];
}
Community
  • 1
  • 1
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
1

As you have mention in your question: It should print out any of the words in file_all that are not in file2

This below small code does this:

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2) = qw(file_all.txt file2.txt);

open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";

while (<$fh1>)
{
    last if eof($fh2);
    my $compline = <$fh2>;
    chomp($_, $compline);
    if ($_ ne $compline)
    {
        print "$_\n";
    }
}

file_all.txt:

ab
cd
ee
ef
gh
df

file2.txt:

zz
yy
ee
ef
pp
df

Output:

ab
cd
gh
serenesat
  • 4,611
  • 10
  • 37
  • 53
0

The issue is the following two lines:

 my %file2 = "file_all.txt";
 my %file1 = "file2.txt";

Here you are assigning a single value, called a SCALAR in Perl, to a Hash (denoted by the % sigil). Hashes consist of key value pairs separated by the arrow operator (=>). e.g.

my %hash = ( key => 'value' );

Hashes expect an even number of arguments because they must be given both a key and a value. You currently only give each Hash a single value, thus this error is thrown.

To assign a value to a SCALAR, you use the $ sigil:

 my $file2 = "file_all.txt";
 my $file1 = "file2.txt";
Hunter McMillen
  • 59,865
  • 24
  • 119
  • 170
  • I tried that and replaced the top two lines with the my $file... but I got "my" variable $file2 masks earlier declaration in same scope at absent.pl line 9. "my" variable $file1 masks earlier declaration in same scope at absent.pl line 14. Global symbol "%file2" requires explicit package name at absent.pl line 11. Global symbol "%file2" requires explicit package name at absent.pl line 16. – user3295674 Apr 28 '15 at 19:35